Personalized Query Auto-Completion Through a Lightweight Representation of the User Context Manojkumar Rangasamy Kannadasan Grigor Aslanyan eBay Inc. eBay Inc. San Jose, CA San Jose, CA mkannadasan@ebay.com gaslanyan@ebay.com ABSTRACT 1 INTRODUCTION Query Auto-Completion (QAC) is a widely used feature in many do- Query Auto-Completion (QAC) is a common feature of most mod- mains, including web and eCommerce search. This feature suggests ern search engines. It refers to the task of suggesting full queries full queries based on a prefix of a few characters typed by the user. after the user has typed a prefix of a few characters [6]. QAC can QAC has been extensively studied in the literature in the recent significantly reduce the number of characters typed [26], which is years, and it has been consistently shown that adding personal- especially helpful to users on mobile devices. QAC can also help ization features can significantly improve the performance of the reduce the number of spelling errors in queries. In cases when the QAC model. In this work we propose a novel method for personal- user is not really sure how to formulate the query, QAC can be of ized QAC that uses lightweight embeddings learnt through fastText great help. It has been shown that QAC can greatly improve user sat- [2, 14]. We construct an embedding for the user context queries, isfaction [24]. Moreover, this can reduce the overall search duration, which are the last few queries issued by the user. We also use the resulting in a lower load on the search engine [1]. Currently QAC same model to get the embedding for the candidate queries to be has a wide range of applications, including search (such as web, ranked. We introduce ranking features that compute the distance eCommerce, email), databases, operating systems, development between the candidate queries and the context queries in the embed- environments [6]. ding space. These features are then combined with other commonly Query Auto-Completion has been extensively studied in the lit- used QAC ranking features to learn a ranking model using the state erature in the recent years. A detailed survey of the work prior to of the art LambdaMART ranker [3]. We apply our method to a large 2016 can be found in [6], which broadly classifies QAC approaches eCommerce search engine (eBay) and show that the ranker with into two main categories - heuristic models and learning based our proposed feature significantly outperforms the baselines on all models. Heuristic models use a few different sources for each possi- of the offline metrics measured, which includes Mean Reciprocal ble query completion and compute a final score. These approaches Rank (MRR), Success Rate (SR), Mean Average Precision (MAP), do not use a large variety of features. In contrast, learning based and Normalized Discounted Cumulative Gain (NDCG). Our base- approaches treat the problem as a ranking problem and rely on the lines include the Most Popular Completion (MPC) model which is a extensive research in the literature in the learning-to-rank (LTR) commonly used baseline in the QAC literature, as well as a ranking field [15]. Learning based approaches rely on a large number of model without our proposed features. The ranking model with the features and generally outperform heuristic models [6]. The fea- proposed features results in a 20 − 30% improvement over the MPC tures for both of these approaches can be broadly characterized model on all metrics. We obtain up to a 5% improvement over the into three groups - time-sensitive, context-based, and demogra- baseline ranking model for all the sessions, which goes up to about phy based. Time-sensitive features model the query popularity and 10% when we restrict to sessions that contain the user context. changes over time, such as weekly patterns. Demographic based Moreover, our proposed features also significantly outperform text features, such as gender and age, are typically limited and may based personalization features studied in the literature before, and be hard to access. In contrast, context based features rely on the adding text based features on top of our proposed embedding based user’s previous search activity (short term, as well as long term) to features results only in minor improvements. suggest new query completions. This data is essentially free, mak- ing context-based features an attractive approach for personalizing ACM Reference Format: QAC. Context-based features for LTR models will be the focus of Manojkumar Rangasamy Kannadasan and Grigor Aslanyan. 2019. Personal- this work. ized Query Auto-Completion Through a Lightweight Representation of the In this paper we propose a novel method to learn the query User Context. In Proceedings of the SIGIR 2019 Workshop on eCommerce embeddings [2, 14] using a simple and scalable technique and use it (SIGIR 2019 eCom), 8 pages. to measure similarity between user context queries and candidate queries to personalize QAC. We learn the embeddings in a semi- supervised fashion using fastText by taking all the queries in a Copyright © 2019 by the paper’s authors. Copying permitted for private and academic session as a single document. We design features that measure the purposes. In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.): similarity between the context and candidate queries, which are Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at then incorporated into a learning-to-rank model. We use the state http://ceur-ws.org of the art LambdaMART model [3] for ranking candidate queries for QAC. Even though embedding based features have been studied SIGIR 2019 eCom, July 2019, Paris, France Manojkumar Rangasamy Kannadasan and Grigor Aslanyan for QAC in the literature before, as discussed in Section 2, our work Word embeddings, such as word2vec [17], glove [20], and fastText makes the following novel contributions: [2, 14], have become increasingly popular in the recent years for a large variety of tasks, including computing similarity between • A lightweight and scalable way to represent the user’s con- words. Embeddings have also been studied in the context of QAC. text in the embedding space. Specifically, Mitra [18] studies a Convolutional Latent Semantic • Simple and robust ranking features based on such embed- Model for distributed representations of queries. Query similarity dings for QAC, which can be used in any heuristic or LTR based on word2vec embeddings is studied in [21] where the features model. are combined with the MPC model. In Section 3 , we explain our • Training and evaluation of a pairwise LambdaMART ranker approach of learning embeddings for the user context in a simple for QAC using the proposed features. We show that our pro- and scalable fashion and the usage of these embeddings and text posed features result in significant improvements in offline based features to personalize QAC. metrics compared with state-of-the-art baselines. • We also compare and combine text based features with em- bedding based features and show that embedding based fea- 3 PERSONALIZED QUERY tures result in larger improvements in offline metrics. AUTO-COMPLETION WITH REFORMULATION The rest of the paper is organized as follows. Section 2 discusses A search session is defined as a sequence of queries ⟨q 1 , q 2 , . . . , qT ⟩ some of the related work in the literature. In Section 3 we describe issued by a user within a particular time frame. A query consists of our methodology. In Section 4 we describe our datasets and ex- a set of tokens. If the user types a prefix pT and ends up issuing the periments. We summarize our work and discuss possible future query qT , then the user’s context is the previous queries issued till research in Section 5. time step T , ⟨q 1 , q 2 , . . . , qT −1 ⟩. For example, if the queries issued in a session is ⟨nike, adidas, shoes⟩, the prefix used to issue the query 2 RELATED WORK shoes is sh, then ⟨q 1 , q 2 , . . . , qT −1 ⟩ = ⟨nike, adidas⟩, pT = sh, qT = The user’s previously entered text is used for personalized QAC by shoes. Given a prefix pT , the user context ⟨q 1 , q 2 , . . . , qT −1 ⟩ and can- Bar-Yossef and Kraus [1]. The method, called NearestCompletion, didate queries QT matching the prefix, our goal is to provide ranking computes the similarity of query completion candidates to the for the queries q ∈ QT such that we have the best ranking for qT . context queries (user’s previously entered queries), using term- The ranking score can be considered as P(QT |⟨q 1 , q 2 , . . . , qT −1 ⟩). weighted vectors for queries and contexts and applying cosine This can be solved using the learning to rank framework. similarity. This method results in significant improvements in MRR. The influence of user context features towards the prediction In addition, the authors proposed the MPC approach, which is accuracy has already been studied in [13, 18]. In this paper we based on the overall popularity of the queries matching the given propose a simple and scalable way to understand the user’s context prefix. MPC is a straightforward heuristic approach with good using query embeddings and use multiple distance related features performance and is typically used as a baseline for more complex to compare the user’s context to the candidate queries QT . approaches. We use MPC as one of the baselines in this work as well. 3.1 Learning Query Representation for The user’s long term search history is used in [5] to selectively Reformulation personalize QAC, where a trade-off between query popularity and Continuous text representations and embeddings for a text can search context is used to encode the ranking signal. Jiang et. al. [13] be learnt through both supervised [18] and semi-supervised ap- study user reformulation behavior using textual features. Shokouhi proaches [2, 14, 17, 20]. In this paper, we learn the query represen- [22] studies QAC personalization using a combination of context tations via semi-supervised techniques. We use the publicly avail- based textual features and demographic features, and shows that the able fastText library [2, 14] for efficient representation learning to user’s long term search history and location are the most effective learn the query embeddings. The fastText model learns subword for QAC personalization. Su et. al. [25] propose the framework representations while taking into account morphology. The model EXOS for personalizing QAC, which also relies on textual features considers subword units, and represents a word by the sum of its (token level). Jiang et. al. [8] use history-level, session-level, and character n-grams. The word iphone with character n-grams (n = 3) query-level textual features for personalized QAC. Fei et. al. [4] is represented as: study features on the observed and predicted search popularity both for longer and shorter time periods for learning personalized “⟨ip”, “iph”, “pho”, “hon”, “one”, “ne⟩” QAC. Diversification of personalized query suggestion is studied in Some of the previous work learns distinct vector representations [7]. for the words thereby ignoring internal structure of the words [17]. Recurrent Neural Networks (RNN) [11] have also been studied If we have a dictionary of n-grams of size G, then the set of n-grams for QAC. Three RNN models - session-based, personalized, and in a word w is denoted as Grw ∈ {1, 2, . . . , G}. We use the skipGram attention based, have been proposed in [12]. Fiorini and Lu [9] model where the goal is to independently predict the presence or use user history based features as well as time features as input absence of the context words. The problem is framed as a binary to an RNN model. [19] uses RNNs to specifically improve QAC classification task. For the word at position t we consider all context performance on previously unseen queries. An adaptable language words as positive examples and sample negatives at random from model is proposed in [10] for personalized QAC. the dictionary as described in [2, 14]. For a context word wc , we Personalized QAC Through a Lightweight Representation of the User Context SIGIR 2019 eCom, July 2019, Paris, France use the negative log likelihood, l : x 7→ loд(1 + e −x ), for the binary Table 1: Textual features based on the user context defined logistic loss. The objective function is defined as: across 3 categories. A subset of features are highlighted in the table. Rest of the features are derived from them. Õ T  Õ Õ  l(s(w , w + l(−s(w , n)) (1)    t c )) t   Category Examples t =1 c ∈Cont ex t n ∈N t,c   ratio of new terms where w t is the target word, wc is the context word, Nt,c is a set ratio of used terms of negative examples sampled from the vocabulary. The scoring average terms in previous queries median terms in previous queries function, s(w, c) is defined as Token trend of number of terms zTд vc Õ s(w, c) = (2) (16 features) unique terms added from last query д ∈G w unique terms retained from last query unique terms removed from last query where zд is the vector representation of each n-gram of a word unique terms added from all previous queries w and vc is the vector representation of the context. Our goal is occurrence of terms in previous queries to learn scalable and lightweight embeddings for queries based frequency in previous queries on their reformulation behavior across different users. Here we Query character n-gram similarity with previous queries represent all the queries issued in a session ⟨q 1 , q 2 , . . . , qT ⟩ as one (7 features) token n-gram similarity with previous queries document in the training data. By learning the subword representa- position in session tions using the probability of words in the context of other words Session unique terms in session present in the queries issued in the same context, we are able to (3 features) common terms in session provide a simple and scalable way to encode the query reformula- tion behavior in the embedding space. We are also able to learn the context. Since the median number of searches in a session is approx syntactic and semantic similarity between the vocabulary. 3, we considered up to 3 queries previously issued by the user We learn query representations by mining 3 days of eBay’s search for generating the remaining embedding features. The Embedding logs to get the query reformulations issued by the user. The query Features are computed as a distance between 2 vectors using cosine log is segmented into sessions with a 30 minute session boundary similarity [23]. as followed in [13]. Based on this definition of a session boundary, • user_context_cosine: Cosine distance between the user con- we collect different queries issued by the users within that session. text vector vC and the current target query vqT . We remove special characters in the query and convert them to • prev_query1_cosine: Cosine distance between the query lowercase. We also filter out sessions with only one query in the vector vqT −1 and the current target query vqT . session. For example, if the user issues q 1 , q 2 , . . . , qT in a session, • prev_query2_cosine: Cosine distance between the query then all of these queries ⟨q 1 , q 2 , . . . , qT ⟩ together, separated by vector vqT −2 and the current target query vqT . whitespace, are considered as one user context. For example, if a • prev_query3_cosine: Cosine distance between the query session contains 2 queries in the user context, “iphone”, “iphone xs vector vqT −3 and the current target query vqT . case”, then a single document for training will be represented as In addition to the Embedding_Features, we also developed var- “iphone iphone xs case”. ious Textual_Features comparing the user context and the current For training unsupervised character n-Gram representations target query to be ranked as defined in Table 1. We categorize them we consider each user context as one document sample. We tune into three categories, namely Token, Query, and Session. There is the model hyperparameters by fixing the dimension of subword a large overlap between the features defined in Table 1 and the representations as 50, minimum occurrence of the words in the features defined in [13, 22]. A query qT can contain multiple tokens. vocabulary to be 20 and hierarchical softmax as the loss function. Users may add or remove tokens between 2 consecutive queries in The other hyperparameters of the model are tuned based on the a session. Based on analyzing the user sessions, between queries Embedding_Features model described in Section 4.3. The number of qT and qT −1 , tokens can either be added and/or removed. These unique words in the vocabulary used to train the model is 189,138. token reformulation user behavior can be encoded via 16 features, The user context ⟨q 1 , q 2 , . . . , qT ⟩ is then converted to multiple vec- described in Table 1, representing the effectiveness of the tokens tor representations. Similar vector representations are also created in the context C and the target query qT . Similarly, Query level for all candidate target queries in the dataset. features represent how users reformulate the queries in a session through repetition and textual similarity between qT and qT −1 . The 3.2 User Context Features Session level features represent how users reformulate their queries In this section we propose different user context features based on without taking into account the relationship to the target query qT . the queries issued in the session. Vector representations are created for both the individual queries as well as the entire context taking 4 EXPERIMENTS all queries in the session. vC represents the user context vector and vqT represents the vector for one query at time step T . We 4.1 Dataset and Experiment Setting develop four features based on the query representations learned We conduct our ranking experiments on a large scale search query in the previous section. We denote these features as Embedding dataset sampled from the logs of eBay Search engine. The query Features. One embedding feature is based on all the queries in the log is segmented into sessions with a 30 minute session boundary SIGIR 2019 eCom, July 2019, Paris, France Manojkumar Rangasamy Kannadasan and Grigor Aslanyan Prefix MPC Top N Candidates Baseline_Features Embedding_Features Ranking Model Final Ranked List fastText Embeddings Textual_Features Context Figure 1: The end to end architecture of the Textual_Embedding model. The architecture for the other models is similar, except that some of the features will be excluded. as described in [13]. For ranking experiments, we do not filter out hyperparameters used for the LambdaMART model are ex- sessions containing a single query. This is to make sure that we actly the same as in all the experiments for the personalized have a single learning to rank model powering sessions with and ranker. without user context. The dataset obtained based on the above logic results in about 53% of the sessions with user’s context. This gives 4.3 Personalized Ranking Models us good coverage of user context features to learn a global model. We have developed three personalized ranking models with differ- The labeling strategy used in [13, 22] assume there is at least ent combinations of user context features, as described in Section one query in the context, remove target queries qT not matching 3.2. These ranking models are compared against the two baseline the prefix pT , setting the first character of the prefix pT based on rankers by experimentally evaluating the improvements on eBay qT . In our method, we use a slightly different labeling strategy for datasets. The results are presented in Section 4.5. building the personalized QAC. We sample a set of impressions • Textual: Ranker with Baseline_Features and Textual_Features from search logs. For each issued query qT , we capture the prefix representing the user context. pT that led to the search page. This is marked as a positive label. For • Embedding: Ranker with Baseline_Features, as well as Em- the same prefix pT we identify rest of the candidate queries QT \qT bedding_Features representing the user context. that were shown to the user and did not lead to the impression. • Textual_Embedding: Ranker with Baseline_Features, Tex- They are marked as negative labels. We also retain sessions without tual_Features, and Embedding_Features representing the user user context. context. The above training data now consists of labeled prefix-query pairs. To learn the performance of the lightweight query represen- For all of the ranking models we first get the top N candidate tation of reformulations, we use LambdaMART [3] as the choice of queries from the MPC model and re-rank them with the rank- learning to-rank algorithms, a boosted tree version of LambdaRank. ing model. We show the full end to end architecture for the Tex- LambdaMART is considered as one of the state-of-the-art learning tual_Embedding model in Figure 1. The architecture for the other to rank algorithms and has won the Yahoo! Learning to Rank Chal- models is similar except that they will only include a subset of the lenge (2010) [13]. We use a pairwise ranking model and fine tune features. our parameters based on the Baseline_Ranker defined in Section 4.2. We fix these parameters to train and evaluate our models across 4.4 Evaluation Metrics all of our experiments. The quality of our predictions can be measured using the following metrics: 4.2 Baseline System • Mean Reciprocal Rank (MRR) - the average of the reciprocal To evaluate our new personalized QAC ranker we establish two ranks of the target queries in the QAC results. Given a test baseline ranking algorithms. dataset S, the MRR for algorithm A is computed as • MPC: The Most Popular Completion model [1] predicts and 1 Õ 1 provides users with candidate queries which are ranked by MRR(A) = (3) |S | hitRank(A, CT , qT ) CT ,qT ∈S the popularity of the query. Popularity of a query is defined as the number of times the query has been issued by all the where CT represents the user context at time step T , qT rep- users in the past. resents the relevant target query, and the function hitRank • Baseline_Ranker: The baseline ranker is a Learning to Rank computes the rank of the relevant query based on the order model built using the same methodology for creating and created by algorithm A. Relevant query refers to the clicked labeling the dataset. The features used in building the model query in QAC. are prefix features, target query features and prefix-query • Success Rate at Top-K (SR@K) - the average percentage of features. We refer to these features as Baseline_Features. The relevant queries ranked at or above the position K in the Personalized QAC Through a Lightweight Representation of the User Context SIGIR 2019 eCom, July 2019, Paris, France Table 2: Offline evaluation metrics. We show the ratio of the metrics for four of the ranking models to the MPC model on both test datasets - all data and filtered data to include full coverage for user context. Dataset Measure Baseline_Ranker Textual_Features Embedding_Features Textual_Embedding_Features MRR 1.26 1.30 1.31 1.31 nDCG 1.30 1.32 1.33 1.33 SR@1 1.24 1.29 1.31 1.31 Whole SR@3 1.23 1.26 1.27 1.27 MAP 1.26 1.30 1.31 1.31 MAP@1 1.24 1.29 1.31 1.31 MAP@3 1.23 1.27 1.29 1.29 MRR 1.27 1.34 1.37 1.37 nDCG 1.30 1.35 1.37 1.37 SR@1 1.26 1.37 1.42 1.41 User Context Only SR@3 1.23 1.30 1.33 1.34 MAP 1.27 1.34 1.37 1.37 MAP@1 1.26 1.37 1.42 1.41 MAP@3 1.24 1.33 1.37 1.37 ranked list from QAC. In this paper we will consider only SR@1, SR@2, SR@3. • Normalized Discounted Cumulative Gain (nDCG) - The Dis- counted Cumulative Gain (DCG) represents the usefulness or gain of the query based on its position in the ranked list from QAC. DCG penalizes the relevance of the query loga- rithmically based on the position of the query in the ranked list. DCG is defined as P Õ 2r eli − 1 DCGq = (4) i=1 loд2 (i + 1) where i denotes the rank and reli is the relevance of query at rank i. For our purposes reli takes values 0 or 1. nDCG is defined as normalized DCG. Namely, it is the ratio of DCG to I DCG (ideal DCG): DCGq Figure 2: A three-dimensional t-SNE plot using the vectors nDCGq = (5) learned from user query reformulation, showing how simi- IDCGq lar intent words are modeled in the embedding space. where I DCGq is the maximum possible value of DCGq for any ranker. The overall performance of the ranking algorithm A is mea- to evaluate if the embeddings are representing the syntactic and sured by the average nDCG across all queries in the dataset: semantic knowledge of the queries learnt from the query reformu- ÍQ lation behavior. We use t-SNE [16] to visualize the embeddings for q=1 nDCG q these sampled queries and show that words like samsung, galaxy, tv nDCG = . (6) Q are close to each other in the embedding space and far from queries • Mean Average Precision (MAP) - the mean of the average pre- like adidas and iphone. This verifies that our query embeddings cision scores for each query across the entire set of queries: have good subword information to represent the user context in ÍQ the embedding space. The t-SNE plot for a small sample of queries q=1 AvдPrecision(q) is shown in Figure 2. MAP = . (7) Offline metrics MRR, SR@k, nDCG, MAP, and MAP@k are shown Q in Table 2, where we have normalized the metrics with respect to 4.5 Results the MPC model. We show results for the whole test dataset, which We perform our evaluation in two phases. Firstly, we evaluate the includes queries with and without user context, as well as the quality of our query representations. Secondly, we evaluate the user dataset with user context only. To assess statistical significance context embeddings against the user context based textual features we have performed 1,000 bootstrap samples over the test queries using a Learning to Rank framework [3]. and computed 95% confidence intervals using those samples. The To evaluate our query representations we sample a few words metrics, together with the 95% confidence intervals, are plotted in across different verticals like fashion, electronics, home and garden, Figure 3, where the plots on the left are for the whole dataset and SIGIR 2019 eCom, July 2019, Paris, France Manojkumar Rangasamy Kannadasan and Grigor Aslanyan MRR Ratio Over MPC MRR Ratio Over MPC 1.31 1.35 1.29 1.30 1.27 1.25 1.25 e g l al elin din tua din g line dd ing tu ing as ed Tex ed se e Tex e dd B mb mb Ba mb mb E E E E tu al_ tu al_ Tex Tex MAP@3 Ratio Over MPC MAP@3 Ratio Over MPC 1.30 1.28 1.35 1.26 1.30 1.24 1.25 1.22 e g al g e ing al g elin din tu din elin dd tu din Ba s ed Tex ed Ba s e Tex ed E mb E mb E mb Em b tu al_ tu al_ Tex Tex SR@3 Ratio Over MPC SR@3 Ratio Over MPC 1.27 1.32 1.25 1.28 1.23 1.24 e g al g line g al ing elin din tu din se din tu dd Ba s be d Tex ed Ba b ed Tex e Em E mb Em E mb tu al_ tu al_ Tex Tex NDCG Ratio Over MPC NDCG Ratio Over MPC 1.34 1.375 1.33 1.32 1.350 1.31 1.325 1.30 1.300 1.29 e g al g line ing al ing elin din tu din se dd tu dd Ba s ed Tex ed Ba e Tex e E mb E mb E mb E mb tu al_ tu al_ Tex Tex Figure 3: Metrics ratio to MPC for the whole dataset (left) and the user context only dataset (right). Error bars are computed using 1,000 bootstrap samples of the test queries. Personalized QAC Through a Lightweight Representation of the User Context SIGIR 2019 eCom, July 2019, Paris, France the plots on the right are for the context only dataset. We have only queries and enable them to model the user context seamlessly. We plotted one variant of each metric since the others are very similar. have leveraged these lightweight embeddings to represent the user Our results show that all of the LTR models result in 20 − 30% context in our personalized ranker for Query Auto-Completion. Dif- improvements over the MPC model. All three models with contex- ferent combinations of user context features are created, including tualization features outperform the Baseline_Ranker on all the textual and embedding features on top of our baseline ranker. We metrics statistically significantly. For example, for MAP@3 Embed- have applied these personalization features to a large scale commer- ding outperforms Baseline_Ranker by 5% on the whole dataset cial search engine (eBay) and experimentally verified significant and 10% for the context only dataset. The Embedding model also improvements on all the offline ranking metrics. We have evalu- outperforms Textual with an improvement of 1.5% for MAP@3 ated our personalized ranker on the entire dataset and a dataset on the whole dataset and 3% for the context only dataset. The restricted to sessions containing the user context. We see the biggest Textual_Embedding model performs very similarly to Embed- improvements on the user context filtered dataset. Furthermore, ding which implies that the embedding based features proposed we show that the ranking model with embedding features outper- in this work capture all of the information in the textual features forms the model with the textual features, whereas the model with (from the perspective of the ranking model), and provide additional combined textual and embedding features results in only minor significant improvements. improvements on top of the model with embedding features alone. The minor improvements from the textual features is likely due to 4.6 Feature Analysis the session level features which are agnostic of the queries in the In this section we analyze the user context embedding features context. As a future work, we would like to explore different repre- through partial dependence plots shown in Figure 4. The partial sentation learning techniques like sent2vec, doc2vec, and sequence dependence plot for the user_context_cosine feature clearly indicates models, to understand the user context better and incorporate them that the cosine similarity between the user context ⟨q 1 , q 2 , . . . , qT −1 ⟩ in the personalized ranker. We also plan to explore the trade offs and the target query qT has a linear relationship with the target. The between short term and long term user contexts in QAC. Lastly, embedding features based on individual time step (prev_query1_cosine, the user context vectors provide a simple and scalable way to un- prev_query2_cosine, prev_query3_cosine) also show a clear mono- derstand the user sessions which can be utilized for personalizing tonic relationship. different parts of search and recommender systems. REFERENCES [1] Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive Query Auto-completion. In Proceedings of the 20th International Conference on World Wide Web (WWW ’11). ACM, New York, NY, USA, 107–116. https://doi.org/10.1145/1963405.1963424 [2] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. [3] Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23-581 (2010), 81. [4] Fei Cai, Wanyu Chen, and Xinliang Ou. 2017. Learning search popularity for personalized query completion in information retrieval. Journal of Intelligent & Fuzzy Systems 33, 4 (2017), 2427–2435. [5] Fei Cai and Maarten de Rijke. 2016. Selectively Personalizing Query Auto- Completion. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’16). ACM, New York, NY, USA, 993–996. https://doi.org/10.1145/2911451.2914686 [6] Fei Cai and Maarten de Rijke. 2016. A Survey of Query Auto Completion in Information Retrieval. Foundations and Trends® in Information Retrieval 10, 4 (2016), 273–363. https://doi.org/10.1561/1500000055 [7] Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2017. Personalized query suggestion diversification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 817–820. [8] Jiang Danyang, Fei Cai, and Honghui Chen. 2018. Personalizing Query Auto- Completion for Multi-Session Tasks. 203–207. https://doi.org/10.1109/CCET. 2018.8542201 [9] Nicolas Fiorini and Zhiyong Lu. 2018. Personalized neural language models for real-world query auto completion. arXiv preprint arXiv:1804.06439 (2018). [10] Aaron Jaech and Mari Ostendorf. 2018. Personalized language model for query Figure 4: Partial dependence plots for user context embed- auto-completion. arXiv preprint arXiv:1804.09661 (2018). ding features learnt from query reformulation. [11] L. C. Jain and L. R. Medsker. 1999. Recurrent Neural Networks: Design and Appli- cations (1st ed.). CRC Press, Inc., Boca Raton, FL, USA. [12] D. Jiang, W. Chen, F. Cai, and H. Chen. 2018. Neural Attentive Personaliza- tion Model for Query Auto-Completion. In 2018 IEEE 3rd Advanced Informa- tion Technology, Electronic and Automation Control Conference (IAEAC). 725–730. 5 SUMMARY AND FUTURE WORK https://doi.org/10.1109/IAEAC.2018.8577694 In this work we have presented a simple and scalable approach [13] Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. 2014. Learning User Reformulation Behavior for Query Auto-completion. In Proceedings of the 37th to learn lightweight vector representations (embeddings) for the International ACM SIGIR Conference on Research & Development in Information query reformulations in a user session. These query representations Retrieval (SIGIR ’14). ACM, New York, NY, USA, 445–454. https://doi.org/10. exhibit both syntactic and semantic similarities between different 1145/2600428.2609614 SIGIR 2019 eCom, July 2019, Paris, France Manojkumar Rangasamy Kannadasan and Grigor Aslanyan [14] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016). [15] Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval. Foundations and Trends® in Information Retrieval 3, 3 (2009), 225–331. https://doi.org/10.1561/ 1500000016 [16] Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605. [17] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’13). Curran Associates Inc., USA, 3111–3119. http: //dl.acm.org/citation.cfm?id=2999792.2999959 [18] Bhaskar Mitra. 2015. Exploring session context using distributed representations of queries and reformulations. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, 3–12. [19] Dae Hoon Park and Rikio Chiba. 2017. A neural language model for query auto- completion. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1189–1192. [20] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Lan- guage Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/ D14-1162 [21] Taihua Shao, Honghui Chen, and Wanyu Chen. 2018. Query Auto-Completion Based on Word2vec Semantic Similarity. In Journal of Physics: Conference Series, Vol. 1004. IOP Publishing, 012018. [22] Milad Shokouhi. 2013. Learning to Personalize Query Auto-completion. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’13). ACM, New York, NY, USA, 103–112. https://doi.org/10.1145/2484028.2484076 [23] Amit Singhal et al. 2001. Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24, 4 (2001), 35–43. [24] Yang Song, Dengyong Zhou, and Li-wei He. 2011. Post-ranking Query Suggestion by Diversifying Search Results. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’11). ACM, New York, NY, USA, 815–824. https://doi.org/10.1145/2009916.2010025 [25] F. Su, M. Somaiya, S. Mishra, and R. Mukherjee. 2015. EXOS: Expansion on session for enhancing effectiveness of query auto-completion. In 2015 IEEE International Conference on Big Data (Big Data). 1154–1163. https://doi.org/10.1109/BigData. 2015.7363869 [26] Aston Zhang, Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong, Yi Chang, Carl A. Gunter, and Jiawei Han. 2015. adaQAC: Adaptive Query Auto-Completion via Implicit Negative Feedback. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15). ACM, New York, NY, USA, 143–152. https://doi.org/10.1145/2766462.2767697