<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Personalized Query Auto-Completion Through a Lightweight Representation of the User Context</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manojkumar Rangasamy Kannadasan</string-name>
          <email>mkannadasan@ebay.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigor Aslanyan</string-name>
          <email>gaslanyan@ebay.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>eBay Inc.</institution>
          ,
          <addr-line>San Jose, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Query Auto-Completion (QAC) is a widely used feature in many domains, including web and eCommerce search. This feature suggests full queries based on a prefix of a few characters typed by the user. QAC has been extensively studied in the literature in the recent years, and it has been consistently shown that adding personalization features can significantly improve the performance of the QAC model. In this work we propose a novel method for personalized QAC that uses lightweight embeddings learnt through fastText [2, 14]. We construct an embedding for the user context queries, which are the last few queries issued by the user. We also use the same model to get the embedding for the candidate queries to be ranked. We introduce ranking features that compute the distance between the candidate queries and the context queries in the embedding space. These features are then combined with other commonly used QAC ranking features to learn a ranking model using the state of the art LambdaMART ranker [3]. We apply our method to a large eCommerce search engine (eBay) and show that the ranker with our proposed feature significantly outperforms the baselines on all of the ofline metrics measured, which includes Mean Reciprocal Rank (MRR), Success Rate (SR), Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (NDCG). Our baselines include the Most Popular Completion (MPC) model which is a commonly used baseline in the QAC literature, as well as a ranking model without our proposed features. The ranking model with the proposed features results in a 20 − 30% improvement over the MPC model on all metrics. We obtain up to a 5% improvement over the baseline ranking model for all the sessions, which goes up to about 10% when we restrict to sessions that contain the user context. Moreover, our proposed features also significantly outperform text based personalization features studied in the literature before, and adding text based features on top of our proposed embedding based features results only in minor improvements.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright © 2019 by the paper’s authors. Copying permitted for private and academic
purposes.</p>
      <p>In: J. Degenhardt, S. Kallumadi, U. Porwal, A. Trotman (eds.):
Proceedings of the SIGIR 2019 eCom workshop, July 2019, Paris, France, published at
http://ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Query Auto-Completion (QAC) is a common feature of most
modern search engines. It refers to the task of suggesting full queries
after the user has typed a prefix of a few characters [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. QAC can
significantly reduce the number of characters typed [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], which is
especially helpful to users on mobile devices. QAC can also help
reduce the number of spelling errors in queries. In cases when the
user is not really sure how to formulate the query, QAC can be of
great help. It has been shown that QAC can greatly improve user
satisfaction [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Moreover, this can reduce the overall search duration,
resulting in a lower load on the search engine [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Currently QAC
has a wide range of applications, including search (such as web,
eCommerce, email), databases, operating systems, development
environments [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Query Auto-Completion has been extensively studied in the
literature in the recent years. A detailed survey of the work prior to
2016 can be found in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which broadly classifies QAC approaches
into two main categories - heuristic models and learning based
models. Heuristic models use a few diferent sources for each
possible query completion and compute a final score. These approaches
do not use a large variety of features. In contrast, learning based
approaches treat the problem as a ranking problem and rely on the
extensive research in the literature in the learning-to-rank (LTR)
ifeld [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Learning based approaches rely on a large number of
features and generally outperform heuristic models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
features for both of these approaches can be broadly characterized
into three groups - time-sensitive, context-based, and
demography based. Time-sensitive features model the query popularity and
changes over time, such as weekly patterns. Demographic based
features, such as gender and age, are typically limited and may
be hard to access. In contrast, context based features rely on the
user’s previous search activity (short term, as well as long term) to
suggest new query completions. This data is essentially free,
making context-based features an attractive approach for personalizing
QAC. Context-based features for LTR models will be the focus of
this work.
      </p>
      <p>
        In this paper we propose a novel method to learn the query
embeddings [
        <xref ref-type="bibr" rid="ref14 ref2">2, 14</xref>
        ] using a simple and scalable technique and use it
to measure similarity between user context queries and candidate
queries to personalize QAC. We learn the embeddings in a
semisupervised fashion using fastText by taking all the queries in a
session as a single document. We design features that measure the
similarity between the context and candidate queries, which are
then incorporated into a learning-to-rank model. We use the state
of the art LambdaMART model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for ranking candidate queries
for QAC. Even though embedding based features have been studied
for QAC in the literature before, as discussed in Section 2, our work
makes the following novel contributions:
• A lightweight and scalable way to represent the user’s
context in the embedding space.
• Simple and robust ranking features based on such
embeddings for QAC, which can be used in any heuristic or LTR
model.
• Training and evaluation of a pairwise LambdaMART ranker
for QAC using the proposed features. We show that our
proposed features result in significant improvements in ofline
metrics compared with state-of-the-art baselines.
• We also compare and combine text based features with
embedding based features and show that embedding based
features result in larger improvements in ofline metrics.
      </p>
      <p>The rest of the paper is organized as follows. Section 2 discusses
some of the related work in the literature. In Section 3 we describe
our methodology. In Section 4 we describe our datasets and
experiments. We summarize our work and discuss possible future
research in Section 5.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        The user’s previously entered text is used for personalized QAC by
Bar-Yossef and Kraus [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The method, called NearestCompletion,
computes the similarity of query completion candidates to the
context queries (user’s previously entered queries), using
termweighted vectors for queries and contexts and applying cosine
similarity. This method results in significant improvements in MRR.
In addition, the authors proposed the MPC approach, which is
based on the overall popularity of the queries matching the given
prefix. MPC is a straightforward heuristic approach with good
performance and is typically used as a baseline for more complex
approaches. We use MPC as one of the baselines in this work as
well.
      </p>
      <p>
        The user’s long term search history is used in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to selectively
personalize QAC, where a trade-of between query popularity and
search context is used to encode the ranking signal. Jiang et. al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
study user reformulation behavior using textual features. Shokouhi
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] studies QAC personalization using a combination of context
based textual features and demographic features, and shows that the
user’s long term search history and location are the most efective
for QAC personalization. Su et. al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] propose the framework
EXOS for personalizing QAC, which also relies on textual features
(token level). Jiang et. al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] use history-level, session-level, and
query-level textual features for personalized QAC. Fei et. al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
study features on the observed and predicted search popularity
both for longer and shorter time periods for learning personalized
QAC. Diversification of personalized query suggestion is studied in
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Recurrent Neural Networks (RNN) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] have also been studied
for QAC. Three RNN models - session-based, personalized, and
attention based, have been proposed in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Fiorini and Lu [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
use user history based features as well as time features as input
to an RNN model. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] uses RNNs to specifically improve QAC
performance on previously unseen queries. An adaptable language
model is proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for personalized QAC.
      </p>
      <p>
        Word embeddings, such as word2vec [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], glove [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and fastText
[
        <xref ref-type="bibr" rid="ref14 ref2">2, 14</xref>
        ], have become increasingly popular in the recent years for
a large variety of tasks, including computing similarity between
words. Embeddings have also been studied in the context of QAC.
Specifically, Mitra [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] studies a Convolutional Latent Semantic
Model for distributed representations of queries. Query similarity
based on word2vec embeddings is studied in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] where the features
are combined with the MPC model. In Section 3 , we explain our
approach of learning embeddings for the user context in a simple
and scalable fashion and the usage of these embeddings and text
based features to personalize QAC.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>PERSONALIZED QUERY</title>
    </sec>
    <sec id="sec-5">
      <title>AUTO-COMPLETION WITH</title>
    </sec>
    <sec id="sec-6">
      <title>REFORMULATION</title>
      <p>A search session is defined as a sequence of queries ⟨q1, q2, . . . , qT ⟩
issued by a user within a particular time frame. A query consists of
a set of tokens. If the user types a prefix pT and ends up issuing the
query qT , then the user’s context is the previous queries issued till
time step T , ⟨q1, q2, . . . , qT −1⟩. For example, if the queries issued in
a session is ⟨nike, adidas, shoes⟩, the prefix used to issue the query
shoes is sh, then ⟨q1, q2, . . . , qT −1⟩ = ⟨nike, adidas⟩, pT = sh, qT =
shoes. Given a prefix pT , the user context ⟨q1, q2, . . . , qT −1⟩ and
candidate queries QT matching the prefix, our goal is to provide ranking
for the queries q ∈ QT such that we have the best ranking for qT .
The ranking score can be considered as P (QT |⟨q1, q2, . . . , qT −1⟩).
This can be solved using the learning to rank framework.</p>
      <p>
        The influence of user context features towards the prediction
accuracy has already been studied in [
        <xref ref-type="bibr" rid="ref13 ref18">13, 18</xref>
        ]. In this paper we
propose a simple and scalable way to understand the user’s context
using query embeddings and use multiple distance related features
to compare the user’s context to the candidate queries QT .
3.1
      </p>
    </sec>
    <sec id="sec-7">
      <title>Learning Query Representation for</title>
    </sec>
    <sec id="sec-8">
      <title>Reformulation</title>
      <p>
        Continuous text representations and embeddings for a text can
be learnt through both supervised [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and semi-supervised
approaches [
        <xref ref-type="bibr" rid="ref14 ref17 ref2 ref20">2, 14, 17, 20</xref>
        ]. In this paper, we learn the query
representations via semi-supervised techniques. We use the publicly
available fastText library [
        <xref ref-type="bibr" rid="ref14 ref2">2, 14</xref>
        ] for eficient representation learning to
learn the query embeddings. The fastText model learns subword
representations while taking into account morphology. The model
considers subword units, and represents a word by the sum of its
character n-grams. The word iphone with character n-grams (n = 3)
is represented as:
      </p>
      <p>“⟨ip”, “iph”, “pho”, “hon”, “one”, “ne⟩”</p>
      <p>
        Some of the previous work learns distinct vector representations
for the words thereby ignoring internal structure of the words [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
If we have a dictionary of n-grams of size G, then the set of n-grams
in a word w is denoted as Grw ∈ {1, 2, . . . , G }. We use the skipGram
model where the goal is to independently predict the presence or
absence of the context words. The problem is framed as a binary
classification task. For the word at position t we consider all context
words as positive examples and sample negatives at random from
the dictionary as described in [
        <xref ref-type="bibr" rid="ref14 ref2">2, 14</xref>
        ]. For a context word wc , we
use the negative log likelihood, l : x 7→ loд(1 + e−x ), for the binary
logistic loss. The objective function is defined as:
      </p>
      <p>ÕtT=1 c ∈CÕont ex t l (s(wt , wc )) + n ∈ÕNt,c l (−s(wt , n)) (1)
where wt is the target word, wc is the context word, Nt,c is a set
of negative examples sampled from the vocabulary. The scoring
function, s(w, c) is defined as
s(w, c) =
Õ
д ∈Gw
zTд vc
(2)
where zд is the vector representation of each n-gram of a word
w and vc is the vector representation of the context. Our goal is
to learn scalable and lightweight embeddings for queries based
on their reformulation behavior across diferent users. Here we
represent all the queries issued in a session ⟨q1, q2, . . . , qT ⟩ as one
document in the training data. By learning the subword
representations using the probability of words in the context of other words
present in the queries issued in the same context, we are able to
provide a simple and scalable way to encode the query
reformulation behavior in the embedding space. We are also able to learn the
syntactic and semantic similarity between the vocabulary.</p>
      <p>
        We learn query representations by mining 3 days of eBay’s search
logs to get the query reformulations issued by the user. The query
log is segmented into sessions with a 30 minute session boundary
as followed in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Based on this definition of a session boundary,
we collect diferent queries issued by the users within that session.
We remove special characters in the query and convert them to
lowercase. We also filter out sessions with only one query in the
session. For example, if the user issues q1, q2, . . . , qT in a session,
then all of these queries ⟨q1, q2, . . . , qT ⟩ together, separated by
whitespace, are considered as one user context. For example, if a
session contains 2 queries in the user context, “iphone”, “iphone xs
case”, then a single document for training will be represented as
“iphone iphone xs case”.
      </p>
      <p>For training unsupervised character n-Gram representations
we consider each user context as one document sample. We tune
the model hyperparameters by fixing the dimension of subword
representations as 50, minimum occurrence of the words in the
vocabulary to be 20 and hierarchical softmax as the loss function.
The other hyperparameters of the model are tuned based on the
Embedding_Features model described in Section 4.3. The number of
unique words in the vocabulary used to train the model is 189,138.
The user context ⟨q1, q2, . . . , qT ⟩ is then converted to multiple
vector representations. Similar vector representations are also created
for all candidate target queries in the dataset.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>User Context Features</title>
      <p>
        In this section we propose diferent user context features based on
the queries issued in the session. Vector representations are created
for both the individual queries as well as the entire context taking
all queries in the session. vC represents the user context vector
and vqT represents the vector for one query at time step T . We
develop four features based on the query representations learned
in the previous section. We denote these features as Embedding
Features. One embedding feature is based on all the queries in the
context. Since the median number of searches in a session is approx
3, we considered up to 3 queries previously issued by the user
for generating the remaining embedding features. The Embedding
Features are computed as a distance between 2 vectors using cosine
similarity [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>• user_context_cosine: Cosine distance between the user
context vector vC and the current target query vqT .
• prev_query1_cosine: Cosine distance between the query
vector vqT −1 and the current target query vqT .
• prev_query2_cosine: Cosine distance between the query
vector vqT −2 and the current target query vqT .
• prev_query3_cosine: Cosine distance between the query
vector vqT −3 and the current target query vqT .</p>
      <p>
        In addition to the Embedding_Features, we also developed
various Textual_Features comparing the user context and the current
target query to be ranked as defined in Table 1. We categorize them
into three categories, namely Token, Query, and Session. There is
a large overlap between the features defined in Table 1 and the
features defined in [
        <xref ref-type="bibr" rid="ref13 ref22">13, 22</xref>
        ]. A query qT can contain multiple tokens.
Users may add or remove tokens between 2 consecutive queries in
a session. Based on analyzing the user sessions, between queries
qT and qT −1, tokens can either be added and/or removed. These
token reformulation user behavior can be encoded via 16 features,
described in Table 1, representing the efectiveness of the tokens
in the context C and the target query qT . Similarly, Query level
features represent how users reformulate the queries in a session
through repetition and textual similarity between qT and qT −1. The
Session level features represent how users reformulate their queries
without taking into account the relationship to the target query qT .
We conduct our ranking experiments on a large scale search query
dataset sampled from the logs of eBay Search engine. The query
log is segmented into sessions with a 30 minute session boundary
      </p>
      <sec id="sec-9-1">
        <title>Prefix MPC</title>
      </sec>
      <sec id="sec-9-2">
        <title>Top N Candidates</title>
        <sec id="sec-9-2-1">
          <title>Baseline_Features fastText</title>
        </sec>
      </sec>
      <sec id="sec-9-3">
        <title>Context Embeddings</title>
        <sec id="sec-9-3-1">
          <title>Textual_Features</title>
        </sec>
        <sec id="sec-9-3-2">
          <title>Embedding_Features</title>
        </sec>
      </sec>
      <sec id="sec-9-4">
        <title>Ranking Model</title>
      </sec>
      <sec id="sec-9-5">
        <title>Final Ranked List</title>
        <p>
          as described in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For ranking experiments, we do not filter out
sessions containing a single query. This is to make sure that we
have a single learning to rank model powering sessions with and
without user context. The dataset obtained based on the above logic
results in about 53% of the sessions with user’s context. This gives
us good coverage of user context features to learn a global model.
        </p>
        <p>
          The labeling strategy used in [
          <xref ref-type="bibr" rid="ref13 ref22">13, 22</xref>
          ] assume there is at least
one query in the context, remove target queries qT not matching
the prefix pT , setting the first character of the prefix pT based on
qT . In our method, we use a slightly diferent labeling strategy for
building the personalized QAC. We sample a set of impressions
from search logs. For each issued query qT , we capture the prefix
pT that led to the search page. This is marked as a positive label. For
the same prefix pT we identify rest of the candidate queries QT \ qT
that were shown to the user and did not lead to the impression.
They are marked as negative labels. We also retain sessions without
user context.
        </p>
        <p>
          The above training data now consists of labeled prefix-query
pairs. To learn the performance of the lightweight query
representation of reformulations, we use LambdaMART [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] as the choice of
learning to-rank algorithms, a boosted tree version of LambdaRank.
LambdaMART is considered as one of the state-of-the-art learning
to rank algorithms and has won the Yahoo! Learning to Rank
Challenge (2010) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We use a pairwise ranking model and fine tune
our parameters based on the Baseline_Ranker defined in Section
4.2. We fix these parameters to train and evaluate our models across
all of our experiments.
4.2
        </p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Baseline System</title>
      <p>To evaluate our new personalized QAC ranker we establish two
baseline ranking algorithms.</p>
      <p>
        • MPC: The Most Popular Completion model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] predicts and
provides users with candidate queries which are ranked by
the popularity of the query. Popularity of a query is defined
as the number of times the query has been issued by all the
users in the past.
• Baseline_Ranker: The baseline ranker is a Learning to Rank
model built using the same methodology for creating and
labeling the dataset. The features used in building the model
are prefix features, target query features and prefix-query
features. We refer to these features as Baseline_Features. The
hyperparameters used for the LambdaMART model are
exactly the same as in all the experiments for the personalized
ranker.
4.3
      </p>
    </sec>
    <sec id="sec-11">
      <title>Personalized Ranking Models</title>
      <p>We have developed three personalized ranking models with
diferent combinations of user context features, as described in Section
3.2. These ranking models are compared against the two baseline
rankers by experimentally evaluating the improvements on eBay
datasets. The results are presented in Section 4.5.</p>
      <p>• Textual: Ranker with Baseline_Features and Textual_Features
representing the user context.
• Embedding: Ranker with Baseline_Features, as well as
Embedding_Features representing the user context.
• Textual_Embedding: Ranker with Baseline_Features,
Textual_Features, and Embedding_Features representing the user
context.</p>
      <p>For all of the ranking models we first get the top N candidate
queries from the MPC model and re-rank them with the
ranking model. We show the full end to end architecture for the
Textual_Embedding model in Figure 1. The architecture for the other
models is similar except that they will only include a subset of the
features.
4.4</p>
    </sec>
    <sec id="sec-12">
      <title>Evaluation Metrics</title>
      <p>The quality of our predictions can be measured using the following
metrics:
• Mean Reciprocal Rank (MRR) - the average of the reciprocal
ranks of the target queries in the QAC results. Given a test
dataset S, the MRR for algorithm A is computed as</p>
      <p>1 Õ 1
MRR(A) = |S | CT ,qT ∈S hitRank(A, CT , qT )
(3)
where CT represents the user context at time step T , qT
represents the relevant target query, and the function hitRank
computes the rank of the relevant query based on the order
created by algorithm A. Relevant query refers to the clicked
query in QAC.
• Success Rate at Top-K (SR@K ) - the average percentage of
relevant queries ranked at or above the position K in the
where i denotes the rank and reli is the relevance of query
at rank i. For our purposes reli takes values 0 or 1.
nDCG is defined as normalized DCG. Namely, it is the ratio
of DCG to I DCG (ideal DCG):</p>
      <p>DCGq =
ÕP 2r eli − 1
i=1 loд2(i + 1)
nDCGq =</p>
      <p>DCGq</p>
      <p>I DCGq
where I DCGq is the maximum possible value of DCGq for
any ranker.</p>
      <p>The overall performance of the ranking algorithm A is
measured by the average nDCG across all queries in the dataset:
nDCG =
ÍQ
q=1 nDCGq</p>
      <p>.</p>
      <p>Q
• Mean Average Precision (MAP) - the mean of the average
precision scores for each query across the entire set of queries:
MAP =
ÍQ
q=1 AvдPrecision(q)</p>
      <p>Q
.
4.5</p>
    </sec>
    <sec id="sec-13">
      <title>Results</title>
      <p>
        We perform our evaluation in two phases. Firstly, we evaluate the
quality of our query representations. Secondly, we evaluate the user
context embeddings against the user context based textual features
using a Learning to Rank framework [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        To evaluate our query representations we sample a few words
across diferent verticals like fashion, electronics, home and garden,
(4)
(5)
(6)
(7)
to evaluate if the embeddings are representing the syntactic and
semantic knowledge of the queries learnt from the query
reformulation behavior. We use t-SNE [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to visualize the embeddings for
these sampled queries and show that words like samsung, galaxy, tv
are close to each other in the embedding space and far from queries
like adidas and iphone. This verifies that our query embeddings
have good subword information to represent the user context in
the embedding space. The t-SNE plot for a small sample of queries
is shown in Figure 2.
      </p>
      <p>Ofline metrics MRR, SR@k, nDCG, MAP , and MAP @k are shown
in Table 2, where we have normalized the metrics with respect to
the MPC model. We show results for the whole test dataset, which
includes queries with and without user context, as well as the
dataset with user context only. To assess statistical significance
we have performed 1,000 bootstrap samples over the test queries
and computed 95% confidence intervals using those samples. The
metrics, together with the 95% confidence intervals, are plotted in</p>
      <p>Figure 3, where the plots on the left are for the whole dataset and</p>
      <p>MRR Ratio Over MPC</p>
      <p>Manojkumar Rangasamy Kannadasan and Grigor Aslanyan</p>
      <p>MRR Ratio Over MPC
using 1,000 bootstrap samples of the tesTt queries. Baseline Embedding Textual
Baseline Embedding Textual
extual_Embedding
extual_Embedding
Figure 3: Metrics ratio to MPC for the whole dataset (left) and the user context only dataset (rightT). Error bars are computed
B
aseline</p>
      <p>Embedding</p>
      <p>Textual
extual_Embedding
T</p>
      <p>B
aseline</p>
      <p>Embedding</p>
      <p>Textual
extual_Embedding
T
SR@3 Ratio Over MPC</p>
      <p>SR@3 Ratio Over MPC
B
aseline</p>
      <p>Embedding
1.34 NDCG Ratio Over MPC</p>
      <p>Textual
extual_Embedding
T</p>
      <p>Baseline Embedding Textual</p>
      <p>extual_Embedding
NDCG Ratio Over MPC T
the plots on the right are for the context only dataset. We have only
plotted one variant of each metric since the others are very similar.</p>
      <p>Our results show that all of the LTR models result in 20 − 30%
improvements over the MPC model. All three models with
contextualization features outperform the Baseline_Ranker on all the
metrics statistically significantly. For example, for MAP @3
Embedding outperforms Baseline_Ranker by 5% on the whole dataset
and 10% for the context only dataset. The Embedding model also
outperforms Textual with an improvement of 1.5% for MAP @3
on the whole dataset and 3% for the context only dataset. The
Textual_Embedding model performs very similarly to
Embedding which implies that the embedding based features proposed
in this work capture all of the information in the textual features
(from the perspective of the ranking model), and provide additional
significant improvements.</p>
    </sec>
    <sec id="sec-14">
      <title>4.6 Feature Analysis</title>
      <p>In this section we analyze the user context embedding features
through partial dependence plots shown in Figure 4. The partial
dependence plot for the user_context_cosine feature clearly indicates
that the cosine similarity between the user context ⟨q1, q2, . . . , qT −1⟩
and the target query qT has a linear relationship with the target. The
embedding features based on individual time step (prev_query1_cosine,
prev_query2_cosine, prev_query3_cosine) also show a clear
monotonic relationship.</p>
    </sec>
    <sec id="sec-15">
      <title>5 SUMMARY AND FUTURE WORK</title>
      <p>In this work we have presented a simple and scalable approach
to learn lightweight vector representations (embeddings) for the
query reformulations in a user session. These query representations
exhibit both syntactic and semantic similarities between diferent
queries and enable them to model the user context seamlessly. We
have leveraged these lightweight embeddings to represent the user
context in our personalized ranker for Query Auto-Completion.
Different combinations of user context features are created, including
textual and embedding features on top of our baseline ranker. We
have applied these personalization features to a large scale
commercial search engine (eBay) and experimentally verified significant
improvements on all the ofline ranking metrics. We have
evaluated our personalized ranker on the entire dataset and a dataset
restricted to sessions containing the user context. We see the biggest
improvements on the user context filtered dataset. Furthermore,
we show that the ranking model with embedding features
outperforms the model with the textual features, whereas the model with
combined textual and embedding features results in only minor
improvements on top of the model with embedding features alone.
The minor improvements from the textual features is likely due to
the session level features which are agnostic of the queries in the
context. As a future work, we would like to explore diferent
representation learning techniques like sent2vec, doc2vec, and sequence
models, to understand the user context better and incorporate them
in the personalized ranker. We also plan to explore the trade ofs
between short term and long term user contexts in QAC. Lastly,
the user context vectors provide a simple and scalable way to
understand the user sessions which can be utilized for personalizing
diferent parts of search and recommender systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ziv</given-names>
            <surname>Bar-Yossef</surname>
          </string-name>
          and
          <string-name>
            <given-names>Naama</given-names>
            <surname>Kraus</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Context-sensitive Query Auto-completion</article-title>
          .
          <source>In Proceedings of the 20th International Conference on World Wide Web (WWW '11)</source>
          . ACM, New York, NY, USA,
          <fpage>107</fpage>
          -
          <lpage>116</lpage>
          . https://doi.org/10.1145/1963405.1963424
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          (
          <year>2017</year>
          ),
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Christopher</surname>
            <given-names>JC</given-names>
          </string-name>
          <string-name>
            <surname>Burges</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>From ranknet to lambdarank to lambdamart: An overview</article-title>
          .
          <source>Learning</source>
          <volume>11</volume>
          ,
          <fpage>23</fpage>
          -
          <lpage>581</lpage>
          (
          <year>2010</year>
          ),
          <fpage>81</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Fei</given-names>
            <surname>Cai</surname>
          </string-name>
          , Wanyu Chen, and
          <string-name>
            <given-names>Xinliang</given-names>
            <surname>Ou</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Learning search popularity for personalized query completion in information retrieval</article-title>
          .
          <source>Journal of Intelligent &amp; Fuzzy Systems</source>
          <volume>33</volume>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ),
          <fpage>2427</fpage>
          -
          <lpage>2435</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Fei</given-names>
            <surname>Cai</surname>
          </string-name>
          and Maarten de Rijke.
          <year>2016</year>
          .
          <article-title>Selectively Personalizing Query AutoCompletion</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16)</source>
          . ACM, New York, NY, USA,
          <fpage>993</fpage>
          -
          <lpage>996</lpage>
          . https://doi.org/10.1145/2911451.2914686
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Fei</given-names>
            <surname>Cai</surname>
          </string-name>
          and Maarten de Rijke.
          <year>2016</year>
          .
          <article-title>A Survey of Query Auto Completion in Information Retrieval</article-title>
          .
          <source>Foundations and Trends® in Information Retrieval</source>
          <volume>10</volume>
          ,
          <issue>4</issue>
          (
          <year>2016</year>
          ),
          <fpage>273</fpage>
          -
          <lpage>363</lpage>
          . https://doi.org/10.1561/1500000055
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Wanyu</given-names>
            <surname>Chen</surname>
          </string-name>
          , Fei Cai, Honghui Chen, and Maarten de Rijke.
          <year>2017</year>
          .
          <article-title>Personalized query suggestion diversification</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM</source>
          ,
          <volume>817</volume>
          -
          <fpage>820</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jiang</given-names>
            <surname>Danyang</surname>
          </string-name>
          , Fei Cai, and
          <string-name>
            <given-names>Honghui</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Personalizing Query AutoCompletion for Multi-Session Tasks</article-title>
          .
          <fpage>203</fpage>
          -
          <lpage>207</lpage>
          . https://doi.org/10.1109/CCET.
          <year>2018</year>
          .8542201
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Fiorini</surname>
          </string-name>
          and
          <string-name>
            <given-names>Zhiyong</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Personalized neural language models for real-world query auto completion</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>06439</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Jaech</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mari</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Personalized language model for query auto-completion</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>09661</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Jain</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Medsker</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Recurrent Neural Networks: Design and Applications (1st ed</article-title>
          .). CRC Press, Inc.,
          <string-name>
            <surname>Boca</surname>
            <given-names>Raton</given-names>
          </string-name>
          , FL, USA.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Neural Attentive Personalization Model for Query Auto-Completion</article-title>
          .
          <source>In 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)</source>
          .
          <volume>725</volume>
          -
          <fpage>730</fpage>
          . https://doi.org/10.1109/IAEAC.
          <year>2018</year>
          .8577694
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jyun-Yu</surname>
            <given-names>Jiang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yen-Yu</surname>
            <given-names>Ke</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pao-Yu Chien</surname>
          </string-name>
          , and Pu-Jen Cheng.
          <year>2014</year>
          .
          <article-title>Learning User Reformulation Behavior for Query Auto-completion</article-title>
          .
          <source>In Proceedings of the 37th International ACM SIGIR Conference on Research &amp;#38; Development in Information Retrieval (SIGIR '14)</source>
          . ACM, New York, NY, USA,
          <fpage>445</fpage>
          -
          <lpage>454</lpage>
          . https://doi.org/10. 1145/2600428.2609614
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Armand</surname>
            <given-names>Joulin</given-names>
          </string-name>
          , Edouard Grave, Piotr Bojanowski, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Bag of tricks for eficient text classification</article-title>
          .
          <source>arXiv preprint arXiv:1607.01759</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Tie-Yan Liu</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Learning to Rank for Information Retrieval</article-title>
          .
          <source>Foundations and Trends® in Information Retrieval 3</source>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ),
          <fpage>225</fpage>
          -
          <lpage>331</lpage>
          . https://doi.org/10.1561/ 1500000016
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Laurens</surname>
            <given-names>van der Maaten and Geofrey</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Visualizing data using t-SNE</article-title>
          .
          <source>Journal of machine learning research 9</source>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          (
          <year>2008</year>
          ),
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and Their Compositionality</article-title>
          .
          <source>In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'13)</source>
          . Curran Associates Inc., USA,
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . http: //dl.acm.org/citation.cfm?id=
          <volume>2999792</volume>
          .
          <fpage>2999959</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Bhaskar</given-names>
            <surname>Mitra</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Exploring session context using distributed representations of queries and reformulations</article-title>
          .
          <source>In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Dae</given-names>
            <surname>Hoon</surname>
          </string-name>
          Park and
          <string-name>
            <given-names>Rikio</given-names>
            <surname>Chiba</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A neural language model for query autocompletion</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM</source>
          ,
          <volume>1189</volume>
          -
          <fpage>1192</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          . http://www.aclweb.org/anthology/ D14-1162
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Taihua</surname>
            <given-names>Shao</given-names>
          </string-name>
          , Honghui Chen, and
          <string-name>
            <given-names>Wanyu</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Query Auto-Completion Based on Word2vec Semantic Similarity</article-title>
          .
          <source>In Journal of Physics: Conference Series</source>
          , Vol.
          <volume>1004</volume>
          . IOP Publishing,
          <volume>012018</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Milad</given-names>
            <surname>Shokouhi</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Learning to Personalize Query Auto-completion</article-title>
          .
          <source>In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13)</source>
          . ACM, New York, NY, USA,
          <fpage>103</fpage>
          -
          <lpage>112</lpage>
          . https://doi.org/10.1145/2484028.2484076
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Amit</given-names>
            <surname>Singhal</surname>
          </string-name>
          et al.
          <year>2001</year>
          .
          <article-title>Modern information retrieval: A brief overview</article-title>
          .
          <source>IEEE Data Eng. Bull. 24</source>
          ,
          <issue>4</issue>
          (
          <year>2001</year>
          ),
          <fpage>35</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Yang</surname>
            <given-names>Song</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dengyong Zhou</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li-wei He</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Post-ranking Query Suggestion by Diversifying Search Results</article-title>
          .
          <source>In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '11)</source>
          . ACM, New York, NY, USA,
          <fpage>815</fpage>
          -
          <lpage>824</lpage>
          . https://doi.org/10.1145/2009916.2010025
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Somaiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>EXOS: Expansion on session for enhancing efectiveness of query auto-completion</article-title>
          .
          <source>In 2015 IEEE International Conference on Big Data (Big Data)</source>
          .
          <fpage>1154</fpage>
          -
          <lpage>1163</lpage>
          . https://doi.org/10.1109/BigData.
          <year>2015</year>
          .7363869
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Aston</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong,
          <string-name>
            <given-names>Yi</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Carl A.</given-names>
            <surname>Gunter</surname>
          </string-name>
          , and Jiawei Han.
          <year>2015</year>
          .
          <article-title>adaQAC: Adaptive Query Auto-Completion via Implicit Negative Feedback</article-title>
          .
          <source>In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15)</source>
          . ACM, New York, NY, USA,
          <fpage>143</fpage>
          -
          <lpage>152</lpage>
          . https://doi.org/10.1145/2766462.2767697
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>