<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leverage Implicit Feedback for Context-aware Product Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Keping Bi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Choon Hui Teo</string-name>
          <email>choonhui@amazon.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yesh Dattatreya</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vijai Mohan</string-name>
          <email>vijaim@amazon.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>W. Bruce Croft</string-name>
          <email>croft@cs.umass.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Intelligent Information Retrieval, University of Massachusetts Amherst</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Implicit Feedback</institution>
          ,
          <addr-line>Product Search, Context-aware Search</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Search Labs</institution>
          ,
          <addr-line>Amazon</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers' preferences in order to elicit purchases. Previous work has shown the eficacy of pu rchase hi story in pe rsonalized product search [3]. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer's purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users' hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. It has been studied extensively to model user preference with implicit feedback in recommendation tasks. However, there has been little research on modeling users' short-term interest in product search. We study whether short-term context could help promote users' ideal item in the following result pages for a query. Furthermore, we propose an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more efective than word-based context-aware models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Online shopping has become an important part of people’s daily
life in recent years. In 2017, e-commerce represented 8.2% of global
retail sales (2,197 billion dollars); 46.4% of internet users shop online
and nearly one-fourth of them do so at least once a week [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ].
Product search engines have become an important starting point
for online shopping. A number of consumer surveys have shown
that more online shoppers started searches on e-commerce search
engines (e.g., Amazon) rather than a generic web search engine
(e.g., Google) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        In contrast to document retrieval, where relevance is a universal
evaluation criterion, a product search system is evaluated based on
user purchases that depend on both product relevance and customer
preferences. Previous research on product search [
        <xref ref-type="bibr" rid="ref19 ref38 ref42 ref7 ref8">7, 8, 19, 38, 42</xref>
        ]
focused on product relevance. Several attempts [
        <xref ref-type="bibr" rid="ref27 ref44">27, 44</xref>
        ] were also
made to improve customer satisfaction by diversifying search
results. Ai et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] introduced a personalized ranking model which
takes the users’ preferences learned from their historical reviews
together with the queries as the basis for ranking. However, their
work has several limitations. First, the personalized model cannot
cope with the situations such as users that have not logged in during
searching and thus can not be identified; users that logged in but
do not have enough purchase history, and a single account being
shared by several family members. In these cases, user purchase
records are either not available or containing substantial noise.
Second, given a specific purchase need expressed as a search query,
long-term behaviors may not be as informative to indicate the user’s
preferences as short-term behaviors such as interactions with the
retrieved results. These limitations of existing work on product
search motivate us to model customers’ preferences based on their
interactions with search results, which do not require additional
customers’ information or their purchase history.
      </p>
      <p>
        Customers’ interactions with search results such as clicks can
be considered as implicit feedback based on their preferences. In
information retrieval (IR), there are extensive studies on how to
use users’ feedback on the relevance of top retrieved documents to
abstract a topic model and retrieve more relevant results [
        <xref ref-type="bibr" rid="ref21 ref33 ref46">21, 33, 46</xref>
        ].
These feedback techniques were shown to be very efective and can
also be applied to use implicit feedback such as clicks. In contrast
to document retrieval where a users’ information need can usually
be satisfied by a single click on a relevant result, we observe that,
in product search, users tend to paginate to browse more products
and make comparisons before they make final purchase decisions.
In about 5% to 15% of search trafic, users browse and click results
in the previous pages and purchase items in the later result pages.
This provides us with the chance to collect user clicks more easily,
based on which results shown in the next page can be tailored to
meet the users’ preferences. We reformulate product search as a
dynamic ranking problem, where instead of one-shot ranking based
on the query, the unseen products are re-ranked dynamically when
users paginate to the next search result page (SERP) based on their
implicit feedback collected from previous SERPs.
      </p>
      <p>
        Traditional relevance feedback (RF) methods, which extract
wordbased topic models from feedback documents as an expansion to the
original queries, have potential word mismatch problems despite
their efectiveness [
        <xref ref-type="bibr" rid="ref31 ref46">31, 46</xref>
        ]. To tackle this problem, we propose an
end-to-end context-aware embedding model that can incorporate
both long-term and short-term context to predict purchased items.
In this way, semantic match and the co-occurence relationship
between clicked and purchased items are both captured in the
embeddings. We show the efectiveness of incorporating short-term
context against baselines using both no short-term context and
word-based context.
      </p>
      <p>In this paper, we leverage implicit feedback as short-term context
to provide users with more tailored search results. We first
reformulate product search as a dynamic ranking problem, i.e., when users
request next SERPs, the remaining unseen results will be re-ranked.
We then introduce several context dependency assumptions for the
task, and propose an end-to-end context-aware neural embedding
model that can represent each assumption by changing the
coefifcients to combine long-term and short-term context. We further
investigated the efect of several factors in the task: short-term
context, long-term context, and neural embeddings. Our experimental
results on the datasets collected from search logs of a commercial
product search engine showed that incorporating short-term
context leads to better performance compared with long-term context
and no context, and embedding-based models perform better than
word-based methods in the task under various settings.</p>
      <p>Our contributions can be summarized as follows: (1) we
reformulate conventional one-shot ranking to dynamic ranking (i.e.,
multi-page search) based on user clicks in product search, which
has not been studied before; (2) we introduce diferent context
dependency assumptions and propose a simple yet efective
end-toend embedding model to capture diferent types of dependency; (3)
we investigate diferent aspects in the dynamic ranking task on real
search log data and confirmed the efectiveness of incorporating
short-term context and neural embeddings. Our study on
multipage product search indicates that this is a promising direction and
worth more attention.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>Next, we review three lines of research related to our work:
product search, session-aware recommendation, and user feedback for
information retrieval.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Product Search</title>
      <p>
        Product search has diferent characteristics compared with
general web search; product information is usually more structured
and the evaluation is usually based on purchases rather clicks. In
2006, Jansen and Molina [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] noted that the links retrieved by an
e-commerce search engine are significantly better than those
obtained from general search engines. Since the basic properties of
products such as brands, categories and price are well-structured,
considerable work has been done on searching products based on
facets [
        <xref ref-type="bibr" rid="ref24 ref39">24, 39</xref>
        ]. However, user queries are usually in natural
language and hard to structure. To support keyword search, Duan
et al. [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] extended the Query Likelihood method [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] by
considering the query generated from a mixture of the language model of
background corpus and the language model of the products
conditioned on their specifications. The ranking function constructed in
this approach utilizes exact word matching information whereas
vocabulary mismatch between free-form user queries and
product descriptions or reviews from other users can still be an issue.
Van Gysel et al. [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] noticed this problem and introduced a latent
vector space model which matches queries and products in the
semantic space. The latent vectors of products and words are learned
in an unsupervised way, where vectors of n-grams in the
description and reviews of the product are used to predict the product.
Later, Ai et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] built a hierarchical embedding model in which,
learned representations of users, queries, and products are used to
predict product purchases and associated reviews.
      </p>
      <p>
        Other aspects of product search such as popularity, visual
preference and diversity have also been studied. Li et al. [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] investigated
product retrieval from an economic perspective. Long et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
predicted sales volume of items based on their transaction history
and incorporate this complementary signal with relevance for
product ranking. The efectiveness of images for product search was
also investigated [
        <xref ref-type="bibr" rid="ref11 ref6">6, 11</xref>
        ]. To satisfy diferent users’ intents behind
the same query, eforts on improving result diversity in product
retrieval have also been made [
        <xref ref-type="bibr" rid="ref27 ref44">27, 44</xref>
        ].
      </p>
      <p>
        In terms of labels for training, there are studies on using clicks
as an implicit feedback signal. Wu et al. [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ] jointly modeled clicks
and purchases in a learning-to-rank framework in order to
optimize the gross merchandise volume. To model clicks, they consider
click-through rate of an item for a given query in a set of search
sessions as the signal for training. Karmaker Santu et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
compared the diferent efects of exploiting click-rate, add-to-cart ratios,
order rates as labels. They experimented on multiple representative
learning to rank models in product search with various settings.
Our work also uses clicks as implicit feedback signals, but instead
of aggregating all the clicks under the same query to get
clickthrough rate, we consider the clicks associated with each query as
an indicator of the user’s short-term preference behind that query.
      </p>
      <p>
        Most previous work treat product search as a one-shot ranking
problem, where given a query, static results are shown to users
regardless of their interaction with the result lists. In a diferent
approach, Hu et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] formulate the user behaviors during searching
products as a Markov decision process (MDP) and use reinforcement
learning to optimize the accumulative gain (expected price) of user
purchases. They define the states in the MDP to be a non-terminal
state, from where users continue to browse, and two terminal states,
i.e. purchases happen (conversion events) or users abandon the
results (abandon events). Their method is essentially online learning
and refines the ranking model with large-scale users’ behavior data.
Although we work on a similar scenario where the results shown
in next page can be revised, they gradually refine an overall ranker
that afects all the queries while our model revises results for each
individual query based on the estimation of the user preference
under the query. Another diference is that they only consider
purchases as a deferred signal for training and do not use any clicks
in the process. In contrast, we treat clicks as an indicator of user
preferences and refine ranking conditioned on the preferences.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Session-aware Recommendation</title>
      <p>
        In session-aware recommendation, a user’s interactions with the
previously seen items in the session are used for recommending
the next item. Considerable research on session-aware
recommendation has been done in the application domains such as news,
music, movies and products. Many these works are based on matrix
factorization [
        <xref ref-type="bibr" rid="ref13 ref16 ref32">13, 16, 32</xref>
        ]. More recently, session-aware
recommendation approaches based on neural networks have shown superior
performance. Hidasi et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] model the clickstream in a session
with Gated Recurrent Unit (GRU) and predict the next item to
recommend in the session. Twardowski [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] also used Recurrent
Neural Networks (RNN) but used attributes for item encoding and
recommended only on unseen items. Quadrana et al. [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] proposed
a hierarchical RNN model, which consists of a session-level GRU
to model users’ activities within sessions and a user-level GRU to
model the evolution of the user across sessions. The updated user
representation will afect the session-level GRU to make
personalized recommendations. Wu and Yan [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ] proposed a two-step
ranking method to recommend item lists based on user clicks and
views in the session. They treat item ranking as a classification
problem and learn the session representation in the first step. With
the session representation as context, items are reranked with a
list-wise loss proposed in ListNet in the second step. Li et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
adopted the attention mechanism in the RNN encoding process to
identify the user’s main purpose in the current session. Quadrana
et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] reviewed extensive previous work on sequence-aware
recommendation and categorized the existing methods in terms of
diferent tasks, goals, and types of context adaption.
      </p>
      <p>
        The goal of a recommendation system is typically to help users
explore items that they may be interested in when they do not have
clear purchase needs. On the contrary, a search engine aims to help
users find only items that are most relevant to their intent specified
in search queries. Relevance plays totally diferent roles in the two
tasks. In addition, the evaluation metrics in recommendation are
usually based on clicks [
        <xref ref-type="bibr" rid="ref12 ref23 ref30 ref37 ref41">12, 23, 30, 37, 41</xref>
        ], whereas product search
is evaluated with purchases under a query.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>User Feedback for Information Retrieval</title>
      <p>
        There are studies on two types of user feedback in information
retrieval, implicit feedback which usually considers click-through
data as the indicator of document relevance and explicit feedback
where users are asked to give the relevance judgments of a batch
of documents. Joachims et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] found that click-through data as
implicit feedback is informative but biased and the relative
preferences derived from clicks are accurate on average. To separate click
bias from relevance signals, Craswell et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] designed a Cascade
Model by assuming that users examine search results from top
to bottom; Dupret and Piwowarski [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed a User Browsing
Model where results can be skipped according to their examination
probability estimated from their positions and last clicks; Chapelle
and Zhang [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] constructed a Dynamic Bayesian Network model
which incorporate a variable to indicate whether a user is satisfied
by a click and leaves the result page. Yue and Joachims [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ]
deifned a dueling bandit problem where reliable relevance signals are
collected from users’ clicks on interleaved results to optimize the
ranking function. Learning an unbiased model directly from biased
click-through data has also been studied by incorporating inverse
propensity weighting and estimating the propensity [
        <xref ref-type="bibr" rid="ref18 ref2 ref40">2, 18, 40</xref>
        ]. In
this work, we model the user preference behind a search query
with her clicks and refine the following results shown to this user.
      </p>
      <p>
        Explicit feedback is also referred to as true relevance feedback
(RF) in information retrieval and has been extensively studied. Users
are asked to assess the relevance of a batch of documents based on
which the retrieval model is refined to find more relevant results.
Rocchio [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] is generally credited as the first relevance feedback
method, which is based on the vector space model [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. After the
language model approach for IR has been proposed [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], the
relevance model version 3 (RM3) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] became one of the state-of-art
pseudo RF methods that is also efective for relevance feedback.
Zamani and Croft [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ] incorporate the semantic match between
unsupervised trained word embeddings into the language model
framework and introduced an embedding-based relevance model
(ERM). Although these RF methods can also be applied in our task,
we propose an end-to-end neural model for relevance feedback in
the context of product search.
3
      </p>
    </sec>
    <sec id="sec-6">
      <title>CONTEXT-AWARE PRODUCT SEARCH</title>
      <p>We reformulate product search as a dynamic re-ranking task where
short-term context represented by the clicks in the previous SERPs
is considered for re-ranking subsequent result pages. Users’ global
interests can also be incorporated for re-ranking as long-term
context. We first introduce our problem formulation and diferent
assumptions of context dependency models. Then we propose a
context-aware embedding model for the task and show how to
optimize the model.
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Problem Formulation</title>
      <p>A query session1 is initiated when a user u issues a query q to the
search engine. The search results returned by the search engine are
typically grouped into pages with similar number of items. Let Rt
be the set of items on the t -th search result page ranked by an initial
ranker and denote by R1:t the union of R1, · · · , Rt . For practical
purposes, we let the re-ranking candidate set Dt +1 for page t + 1
be R1:t +k ⧹V1:t where k ≥ 1 and V1:t is the set of re-ranked items
viewed by the user in the first t pages. Given user u, query q, and the
set of clicked items in the first t pages C1:t as context, the objective
is to rank all, if any, purchased items Bt +1 in Dt +1 at the top of the
next result page.
3.2</p>
    </sec>
    <sec id="sec-8">
      <title>Context Dependency Models</title>
      <p>There are three types of context dependencies that one can use to
model the likelihood of a user purchasing a product in her query
1We refer to the series of user behaviors associated with a query as a query session,
i.e, a user issues a query, clicks results, paginates, purchases items and finally ends
searching with the query.
+
+
c C1:t
c
w
+
w+ }
w
L
i
(
(1
u
c) q
+</p>
      <p>St
w
+
w+ }
w
u u
}
c
i
session, namely, long-term context, short-term context, and
longshort-term context. Figure 1 shows the graphical models for these
context dependencies, where u denotes the latent variable of a
user’s long-term interest that stays the same across all the search
sessions, and clicks in the first t result pages, i.e., C1:t , represents
the user’s short-term preference. Purchased items on and after page
t + 1, i.e., Bt +1, depends on query q and diferent types of context
under diferent dependency assumptions.</p>
      <p>
        Long-term Context Dependency. In this assumption, only
users’ long-term preferences, usually represented by their historical
queries and the corresponding purchased items, are used to predict
the purchases in their current query sessions. An unshown item i is
ranked according to its probability of being purchased given u and
q, namely p (i ∈ Bt +1 |u, q). The advantage of such models is that
personalization of search results (as proposed in Ai et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) can be
conducted from the very beginning of a query session when there is
no feedback information available. However, this model needs user
identity and purchase history, which are not always available. In
addition, the long-term context may not be informative to predict a
user’s final purchases since her current search intent may be totally
diferent from any of her previous searches and purchases.
      </p>
      <p>Short-term Context Dependency. The shortcomings of
longterm context can be addressed by focusing on just the short-term
context, i.e., the user’s actions such as clicks performed within the
current query session. This dependency model assumes that given
the observed clicks in the first t pages, the items purchased in the
subsequent result pages are conditionally independent of the user,
shown in Figure 1. An unseen item i in the query session is
reranked based on its purchase probability conditioning on C1:t and
q, i.e., p (i ∈ Bt +1 |C1:t , q). In this way, users’ short-term preferences
are captured and their identity and purchase records are not needed.
Users with little or no purchase history and who have not logged
in can benefit directly under such a ranking scheme.</p>
      <p>Long-short-term Context Dependency. The third dependency
assumption is that purchases in the subsequent result pages
depend on both short-term context, e.g., previous clicks in the current
query session, and long-term context, such as historical queries and
purchases of the user indicated by u. An unseen item i after page t
is scored according to p (i ∈ Bt +1 |C1:t , q, u ). This setting considers
more information but it also has the drawback of requiring users
identity and purchase history.</p>
      <p>We will introduce how to model the three dependency
assumptions in a same framework in Section 3.3. In this paper, we focus
on the case of non-personalized short-term context and include the
other two types of context for comparison.
3.3</p>
    </sec>
    <sec id="sec-9">
      <title>Context-aware Embedding Model</title>
      <p>
        We designed a context-aware framework where models under
different dependency assumptions can be trained by varying the
corresponding coeficients, shown in Figure 2. To incorporate semantic
meanings and avoid the word mismatch between queries and items,
we embed queries, items and users into latent semantic space. Our
context-aware embedding model is referred to as CEM. We assume
users’ preferences are reflected by their implicit feedback, i.e. their
clicks associated with the query. Similar to relevance feedback
approaches [
        <xref ref-type="bibr" rid="ref21 ref33">21, 33</xref>
        ] that extract a topic model from assessed relevant
w
+
w+ }
w
i
)
Candidate Items
      </p>
      <p>i 2 Dt+1
Overall Context</p>
      <p>}</p>
      <p>Clicked Items
+
c
}
}
w + w + w
w + w + w</p>
      <p>}
w + w + w</p>
      <p>}
w + w + w
documents, our model should capture user preferences from their
clicked items which are implicit positive signals. Components of
CEM will be introduced next.</p>
      <p>Item Embeddings. We use product titles to represent products
since merchants tend to put the most informative, representative
text such as the brand, name, size, color, material and even target
customers in product titles. In this way, items do not have unique
embeddings according to their identifiers and items with the same
titles are considered the same. Although this may not be accurate
all the time, word representations can be generalized to new items,
and we do not need to cope with the cold-start problem. We use the
average of title word embeddings of a product as its own embedding,
i.e.,</p>
      <p>E (i ) = Pw ∈i E (w ) (1)
|i |
where i is the item, and |i | is the title length of item i. We also
evaluated other more complex product title encoding approaches
such as non-linear projection of average word embeddings and
recurrent neural network on title word sequence, but they did not
show superior performance over the simpler one that we use here.</p>
      <p>User Embeddings. A lookup table for user embeddings is
created and used for training, where each user has a unique
representation. This vector is shared across search sessions and updated by
the gradient learned from previous user transactions. In this way,
the long-term interest of the user is captured and we use the user
embeddings as long-term context in our models.</p>
      <p>Query Embeddings. Similar to item embeddings, we use the
simple average embedding of query words as the representation,
which also shows the best performance compared to the non-linear
projection and recurrent neural network methods we have tried.
The embedding of the query is</p>
      <p>E (q) =</p>
      <p>Pw ∈q E (w )
|q|
where |q| is the length of query q.</p>
      <sec id="sec-9-1">
        <title>Short-term Context Embeddings. We use the set of clicked</title>
        <p>items to represent user preference behind the query, which we
refer to as E (C1:t ). For sessions associated with a diferent query
(2)
q or page number t , the clicked items contained in C1:t may difer.
We assume the sequence of clicked items does not matter when
modeling short-term user preference, i.e., the same set of clicked
items should imply the same user preference regardless of the order
of them being clicked. There are two reasons for this assumption.
One is that the user’s purchase need is fixed for a query she issued
and is not afected by the order of clicks. The other is that the order
of user clicks is usually based on the rank of retrieved products
from top to bottom as the user examines each result, which is not
afected by user preference in the non-personalized search results.
So we represent the set as the centroid of each clicked item in the
latent semantic space, where the order of clicks does not make a
diference. A simple yet efective way is to consider equal weights
of all the items in C1:t so that the centroid is simply averaged item
embeddings:</p>
        <p>E (C1:t ) =</p>
        <p>Pi ∈ C1:t E (i )
|C1:t |
where |C1:t | is the number of clicked items in set C1:t .</p>
        <p>We also tried an attention mechanism to weight each clicked
item according to the query and represent the user preference with a
weighted combination of clicked items. However, this method is not
better than combining clicks with equal weights in our experiments.
So we only show simple methods.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Overall Context Embeddings. We use a convex combination</title>
        <p>of user, query, and click embeddings as the representation of overall
context E (St ). i.e.</p>
        <p>E (St ) = (1 − λu − λc )E (q) + λu E (u ) + λc E (C1:t )
0 ≤ λu ≤ 1, 0 ≤ λc ≤ 1, λu + λc ≤ 1
(4)
This overall context is then treated as the basis for predicting
purchased items in Bt +1. When λc = 0, C1:t is ignored in the prediction
and St corresponds to the long-term context shown in Figure 1.
When λu = 0, user u does not have impact on the final purchase
given C1:t . This aligns with the short-term context assumption in
Figure 1. When λu &gt; 0, λc &gt; 0, λu + λc ≤ 1, both long-term and
short-term context are considered and this matches the type of
long-short-term context in Figure 1. So by varying the values of λu
and λc , we can use Equation 4 to model diefrent types of context
dependency and do comparisons.</p>
        <p>
          Attention Allocation Model for Items. With the overall
context collected from the first t pages, we further construct an
attentive model to re-rank the products in the candidate set Dt +1. This
re-ranking process can be considered as an attention allocation
problem. Given the context that indicates the user’s preference and
a set of candidate items that have not been shown to the users
yet, the item which attracts more user attention will have higher
probability to be purchased. The attention weights then act as the
basis for re-ranking. Predicting the probability of each candidate
item being purchased can be considered as attention allocation for
the items. This idea is also similar to the listwise context model
proposed by Ai et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. They extracted the topic model from
topranked documents with recurrent neural networks and used it as
a local context to re-rank the top documents with their attention
weights. The attention weights can be computed as:
score (i |q, u, C1:t ) =
        </p>
        <p>exp(E (St ) · E (i ))
Pi′ ∈Dt+1 exp(E (St ) · E (i ′))
(5)
where E (St ) is computed according to Equation 4. This model can
also be interpreted as a generative model for an item in the candidate
set Dt +1 given the context St . In this case, the probability of an
item in the candidate set Dt +1 being generated from the context
St is computed with a softmax function that take the dot product
score between the embedding of an item and the context as inputs,
i.e,
p (i |C1:t , u, q) = score (i |q, u, C1:t )
(6)
We need to train the model and learn appropriate embeddings
of context and items so that the probability of purchased items
in Dt +1, namely Bt +1, should be larger than the other candidate
items, i.e. Dt +1 ⧹Bt +1. Also, the conditional probability in Equation
6 can be used to compute the likelihood of the observed instance
of C1:t , u, q, Bt +1.</p>
        <p>The embeddings of queries, users, items are learned by maximizing
the likelihood of observing Bt +1 given the condition of C1:t , u, q,
i.e., after user u issued query q, she clicked the items in the first t
SERPs (C1:t ), then models are learned by maximizing the likelihood
for her to finally purchased items in Bt +1 which are shown in and
after page t + 1. There are many possible values of t even for a same
user u if she purchases multiple products on diferent result pages
under query q. These are considered as diferent data entries. Then
the log likelihood of observing purchases in Bt +1 conditioning on
C1:t , u, q in our model can be computed as</p>
        <p>L (Bt +1 |C1:t , u, q) = log p (Bt +1 |C1:t , u, q)</p>
        <p>∝ log Y p (i |C1:t , u, q)
The second step can be inferred if we consider whether an item
will be purchased is independent of another item given the context.</p>
        <p>According to Equation 5, 6 and 7, we can optimize the
conditional log-likelihood directly. A common problem for the softmax
calculation is that the denominator usually involves a large
number of values and is impractical to compute. However, this is not
a problem in our model since we limit the candidate set Dt +1 to
only some top-ranked items retrieved by the initial ranker so that
the computation cost is small.</p>
        <p>
          Similar to previous studies [
          <xref ref-type="bibr" rid="ref3 ref38">3, 38</xref>
          ], we apply L2 regularization on
the embeddings of words and users to avoid overfitting. The final
optimization goal can be written as
        </p>
        <p>L′ = X L (Bt +1 |C1:t , u, q) + γ (X E (w )2 + X E (u )2)
u,q,t w u
= X</p>
        <p>X log
u,q,t i ∈Bt+1
+ γ X E (w )2 + X E (u )2
w u</p>
        <p>exp(E (St ) · E (i ))
Pi′ ∈Dt+1 exp(E (St ) · E (i ′))
(3)
where γ is the hyper-parameter to control the strength of L2
regularization. The function accumulates entries of all the possible user
u, query q, and the valid page number t for pagination which has
∝</p>
        <p>i ∈Bt+1</p>
        <p>X log p (i |C1:t , u, q)
i ∈Bt+1
(7)
(8)
clicks in and before page t and purchases after that page. All
possible words and users are taken into account in the regularization.
When we do not incorporate long-term context, the corresponding
parts of u are omitted.</p>
        <p>
          The loss function actually captures the loss of a list and this
list-wise loss is similar to AttentionRank proposed by Ai et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Because of the softmax function, optimizing the probabilities of
relevant instances in Bt +1 simultaneously minimizes the probabilities
of the rest non-relevant instances. This loss shows superiority over
other list-wise loss such as ListMLE [
          <xref ref-type="bibr" rid="ref43">43</xref>
          ] and SoftRank [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ], which
is another reason we adopt this loss.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>EXPERIMENTAL SETUP</title>
      <p>In this section, we introduce our experimental settings of
contextaware product search. We first describe how we construct the
datasets for experiments. Then we describe the baseline methods
and evaluation methodology for comparing diferent methods. We
also introduce the training settings for our model.
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>Datasets</title>
      <p>We randomly sampled three category-specific datasets, namely,
“Toys &amp; Games”, “Garden &amp; Outdoor”, and “Cell Phones &amp;
Accessories”, from the logs of a commercial product search engine
spanning ten months between years 2017 and 2018. We keep only
the query sessions with at least one clicked item on any page before
the pages with purchased items. These sessions are dificult for the
production model since it could not rank the “right” items on the
top so that users purchased items in the second or later result pages.
Our datasets include up to a few million query sessions containing
several hundred thousand unique queries. When there are multiple
purchases in a query session across diferent result pages, purchases
until page t are only considered as clicks and used together with
other clicks to predict purchases on and after page t + 1. Statistics
of our datasets are shown in Table 1.
4.2</p>
    </sec>
    <sec id="sec-12">
      <title>Evaluation Methodology</title>
      <p>We divided each dataset into training, validation, and test sets by
the date of the query sessions. The sessions occurred in the first 34
weeks are used for training, the following 2 weeks for validation
and the last 4 weeks for testing. Models were trained with data
in the training set; hyper-parameters were tuned according to the
model performance on the validation set, and evaluation results on
the test set were reported for comparison.</p>
      <p>
        Since the datasets are static, it is impossible to evaluate the
models in a truly interactive setting where each subsequent page is
re-ranked based on the observed clicks on the current and previous
pages. Nonetheless, we can still evaluate the performance of
oneshot re-ranking from page t + 1 given the context collected from the
ifrst t pages. In our experiments, we compare diferent methods for
re-ranking from page 2 and page 3 since earlier re-ranking can
inlfuence results at higher positions which have bigger larger impact
on the ranking performance. As in relevance feedback experiments
[
        <xref ref-type="bibr" rid="ref26 ref33">26, 33</xref>
        ], our evaluation is also based on residual ranking, where the
ifrst t result pages are discarded and re-ranking of the unseen items
are evaluated. We use the residual ranking evaluation paradigm
because the results before re-ranking are retrieved by the same
initial ranker and identical for all the re-ranking methods.
      </p>
      <p>Similar to other ranking tasks, we use mean average precision
(MAP ) at cutof 100, mean reciprocal rank ( MRR) and normalized
discounted cumulative gain (N DCG) as ranking metrics. MAP
measure the overall performance of a ranker in terms of both precision
and recall, which indicates the ability to retrieve more purchased
items in next 100 results and ranking them to higher positions. MRR
is the average inverse rank for the first purchase in the retrieved
items. It indicates the expected number of products users need to
browse before finding the ones they are satisfied with. N DCG is a
common metric for multiple-label document ranking. Although in
our context-aware product search, items only have binary labels
indicating whether they were purchased given the context, N DCG
still shows how good a rank list is with emphasis on results at top
positions compared with the ideal rank list. We use N DCG@10 in
our experiments.
4.3</p>
    </sec>
    <sec id="sec-13">
      <title>Baselines</title>
      <p>We compare our short-term context-aware embedding model (SCEM)
with four groups of baseline, retrieval model without using context,
long-term, short-term and long-short-term context-aware models.</p>
      <p>Production Model (PROD). PROD is essentially a gradient
boosted decision tree based model. Comparing with this model
indicates the potential gain of our model if deployed online. Note
that PROD performs worse on our datasets than on the entire search
trafic since we extracted query sessions where the purchased items
are in the second or later result pages.</p>
      <p>Random (RAND). By randomly shufling the results in the
candidate set which consists of the top unseen retrieved items by
the production model, we get the performance of a random
reranking strategy. This performance should be the lower bound of
any reasonable model.</p>
      <p>
        Popularity (POP). In this method, the products in the candidate
set are ranked according to how many times they were purchased
in the training set. Popularity is an important factor for product
search [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] besides relevance.
      </p>
      <sec id="sec-13-1">
        <title>Query Likelihood Model (QL). The query likelihood model</title>
        <p>
          (QL) [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] is a language model approach for information retrieval. It
shows the performance of re-ranking without implicit feedback and
is only based on the bag-of-words representation. The smoothing
parameter µ in QL was tuned from {10, 30, 50, 100, 300, 500}.
        </p>
      </sec>
      <sec id="sec-13-2">
        <title>Query Embedding based Model (QEM). This model scores an</title>
        <p>item by the generative probability of the item given the embedding
of a query. When λu = 0, λc = 0, CEM is exactly QEM.</p>
      </sec>
      <sec id="sec-13-3">
        <title>Long-term Context-aware Relevance Model (LCRM3). Rel</title>
        <p>
          evance Model Version 3 (RM3) [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] is an efective method for both
pseudo and true relevance feedback. It extracts a bag-of-words
language model from a set of feedback documents, expands the original
query with the most important words from the language model,
and retrieve results again with the expanded query. To capture the
long-term interest of a user, we use RM3 to extract significant words
from titles of the user’s historical purchased products and refine the
retrieval results for the user in the test set with the expanded query.
The weight of the initial query was tuned from {0, 0.2, · · · , 1.0}
and the expansion term count was tuned from {10, 20, · · · , 50}. The
efect of query weight is shown in Section 5.2.
        </p>
      </sec>
      <sec id="sec-13-4">
        <title>Long-term Context-aware Embedding Model (LCEM). When</title>
        <p>λc = 0, 0 &lt; λu ≤ 1, CEM becomes LCEM by considering long-term
context indicated by universal user representations.</p>
        <p>Short-term Context-aware Relevance Model (SCRM3). We
also use RM3 to extract the user preference behind a query from
the clicked items in the previous SERPs as short-term context and
refine the next SERP. This method uses the same information as our
short-term context-aware embedding model, but it represents user
preference with a bag-of-words model and only -consider word
exact match between a candidate item and the user preference
model. The query weight and expansion term count were tuned in
the same range as LC-RM3 and the influence of initial query weight
can be found in Section 5.2. 2</p>
      </sec>
      <sec id="sec-13-5">
        <title>Long-short-term Context-aware Embedding Model (LSCEM).</title>
        <p>When λu &gt; 0, λc &gt; 0, 0 &lt; λu + λc ≤ 1, both long-term context
represented by u and short-term context indicated by Ct are taken
into account in CEM.</p>
        <p>PROD, RAND, POP, QL, and QEM are retrieval models that rank
items based on queries and do not rely on context or user
information. These models can be used as the initial ranker for any queries.
The second type of rankers consider users’ long-term interests
together with queries, such as LCEM and LCRM3. These methods
utilize users’ historical purchases but can only be applied to users
who appear in the training set. The third type is feedback models
which take users’ clicks in the query session as short-term context
and this category includes SCRM3 and our SCEM. In this approach,
user identities are not needed. However, they can only be applied
to search sessions where users click on results and only items from
the second result page or later can be refined with the clicks. The
fourth category considers both long and short-term context, e.g.,
LSCEM. The second, third and fourth groups of baseline correspond
to the dependency assumptions shown in the first, second and third
sub-figure in Figure 1 respectively.
4.4</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Model Training</title>
      <p>
        Query sessions with multiple purchases on diferent pages are split
into sub-sessions, one for each page with a purchase. When there
are more than three sub-sessions for a given session, we randomly
select three in each training epoch. We do so to avoid skewing the
dataset with sessions with many purchases. Likewise, we randomly
select five clicked items for constructing short-term context if there
are more than five clicked items in a query session.
2We also implemented the embedding-based relevance model (ERM) [
        <xref ref-type="bibr" rid="ref46">46</xref>
        ], which is
an extension of RM3 by taking semantic similarities between word embeddings into
account, as a context-aware baseline. But it does not perform better than RM3 across
diferent settings. So we did not include it.
      </p>
      <p>
        We implemented our models with Tensorflow. The models were
trained for 20 epochs with the batch size set to 256. Adam [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] was
used as the optimizer and the global norm of parameter gradients
was clipped at 5 to avoid unstable gradient updates. After each
epoch, the model was evaluated on the validation set and the model
with the best performance on the validation set was selected to
be evaluated on the test set. The initial learning rate was selected
from {0.01, 0.005, 0.001, 0.0005, 0.0001}. L2 regularization strength
γ was tuned from 0.0 to 0.005. λq , λu in Equation 4 were tuned from
{0, 0.2, · · · , 0.8, 1.0} (λq + λu ≤ 1) to represent various dependency
assumptions mentioned in Section 3.2, and the embedding size were
scanned from {50, 100, · · · , 300}. The efect of λq , λu and embedding
size are shown in Section 5.
5
      </p>
    </sec>
    <sec id="sec-15">
      <title>RESULTS AND DISCUSSION</title>
      <p>In this section, we show the performance of the four types of models
mentioned in Section 4.3. First, we compare the overall retrieval
performance of various types of models in Section 5.1. Then we
further study the efect of queries, long-term context and embedding
size on each model in the following subsections.
5.1</p>
    </sec>
    <sec id="sec-16">
      <title>Overall Retrieval Performance</title>
      <p>Table 2 shows the performance of diferent methods on re-ranking
items when users paginate to the second and third SERP for Toys &amp;
Games, Garden &amp; Outdoor and Cell Phones &amp; Accessories. Among
all the methods, SCEM and SCRM3 perform better than all the
other baselines without using short-term context, including their
corresponding retrieval baseline, QEM, and QL respectively, and
PROD which considers many additional features, showing the
effectiveness of incorporating short-term context.</p>
      <p>In contrast to the efectiveness of short-term context, long-term
context does not help much when combined with queries alone or
together with short-term context. LCRM3 outperforms QL on all
the datasets by a small margin when users’ historical purchases
are used to represent their preferences. LCEM and LSCEM always
perform worse than QEM and SCEM by incorporating long-term
context with λu &gt; 0. Note that since only a small portion of users in
the test set appear in the training set, the re-ranking performance
of most query sessions in the test set will not be afected. We will
elaborate on the efect of long-term context in Section 5.3.</p>
      <p>We found that neural embedding methods are more efective
than word-based baselines. When implicit feedback is not
incorporated, QEM performs significantly better than QL, sometimes even
better than PROD. When clicks are used as context, with neural
embeddings, SCEM is much more efective than SCRM3. This shows
that semantic match is more beneficial than exact word match for
top retrieved items in product search. In addition, these embeddings
also carry the popularity information since items purchased more
in the training data will get more gradients during training. Due
to our model structure, there are also properties that the
embeddings of items purchased under similar queries or context will be
more alike compared with non-purchased items, and embeddings
of clicked and purchased items are also similar.</p>
      <p>The relative improvement of SCEM and SCRM3 compared to
the production model on Toys &amp; Games is less than the other
3Due the confidentiality policy, the absolute value of each metric can not be revealed.
two datasets. There are two possible reasons. First, the production
model performs better on Toys &amp; Games, compared with Garden &amp;
Outdoor, and Cell Phones &amp; Accessories, which can be seen from
the larger advantages compared with random re-ranking. Second,
the average clicks in the first two and three SERPs in Toys &amp; Games
are less than the other two datasets 4, thus SCEM and SCRM3 can
perform better with more implicit feedback information.</p>
      <p>The relative performance of all the other methods against PROD
is better when re-ranking from page 2 compared with re-ranking
from page 3 in terms of all three metrics. Several reasons are shown
as follows. When purchases happen in the third page or later, it
usually means users cannot find the “right” products in the first
two pages, which further indicates the production model is worse
for these query sessions. In addition, the ranking quality of PROD
on the third page is worse than on the second page. Another reason
that SCRM3 and SCEM improve more upon PROD when re-ranking
from page 3 is that more context becomes available with clicks
collected in the second page and makes the user preference model
more robust.</p>
      <p>QL performs similarly to RAND on Toys &amp; Games and a little
better than RAND on Garden &amp; Outdoor, and Cell Phones &amp;
Accessories, which indicates that relevance captured by exact word
matching is not the key concern in the rank lists of the production
4The specific number of average clicks in the datasets cannot be revealed due to the
confidentiality policy.
model. In addition, most candidate products are consistent with the
query intent but the final purchase depends on users’ preference.
Popularity, as an important factor that consumers will consider, can
improve the performance upon QL. However, it is still worse than
the production model most of the time.
5.2</p>
    </sec>
    <sec id="sec-17">
      <title>Efect of Short-term Context</title>
      <p>We investigate the influence of short-term context by varying the
value of λc with λu set to 0. The performance of SCRM3 and SCEM
varies as the interpolation coeficient of short-term context changes
since only these two methods utilize the clicks. Since re-ranking
from the second or third pages on Toys &amp; Games, Garden and
Mobile all show similar trends, we only report performance of each
method in the setting of re-ranking from second pages on Toys
&amp; Games, which is shown in Figure 3a. Figure 3a shows that as
the weight of clicks is set larger, the performance of SCRM3 and
SCEM goes up consistently. When λc is set to 0, SCRM3 and SCEM
degenerate to QL and QEM respectively which do not incorporate
short-term context. From another perspective, SCRM3 and SCEM
degrade in performance as we increase the weight on queries. For
exact word match based methods, more click signals lead to more
improvements for SCRM3, which is also consistent with the fact
that QL performs similarly to RAND by only considering queries.
For embedding-based methods which capture semantic match and
popularity, QEM with queries alone performs similarly to PROD
PROD
RAND</p>
      <p>POP
QL</p>
      <p>QEM
LCRM3</p>
      <p>SCRM3
SCEM</p>
      <p>PROD
RAND</p>
      <p>POP
QL</p>
      <p>LCEM
LCRM3</p>
      <p>SCRM3
LSCEM</p>
      <p>PROD
RAND</p>
      <p>POP
QL</p>
      <p>LCEM
LCRM3</p>
      <p>SCRM3
LSCEM</p>
      <p>PROD
RAND</p>
      <p>POP
QL</p>
      <p>QEM
LCRM3</p>
      <p>SCRM3
SCEM
30
20
()% 10
t
f
iL 0
P
AM−10
but much better when more context information is incorporated
in SCEM. This indicates that users’ clicks already cover the query
intent, and also contain additional users’ preference information.
5.3</p>
    </sec>
    <sec id="sec-18">
      <title>Efect of Long-term Context</title>
      <p>Next we study the efect of long-term context indicated by users’
global representations E (u ) both with and without incorporating
short-term context. QEM and LCRM3 only use queries and user
historical transactions for ranking; LSCEM uses long and short-term
context (λu + λc is fixed as 1 since we found that query embeddings
do not contribute to the re-ranking performance when short-term
context is incorporated). Toys &amp; Games is used again to show the
sensitivity of each model in terms of λu under the setting of
reranking from the second page. Since there are users in the test set
which never appear in the training set, λu does not take efect due to
the null representations for these unknown users. In Toys &amp; Games,
only about 13% of all the query sessions in the test set are from
users who also appear in the training set. The performance change
on the entire test set will be smaller due to the low proportion the
models can efect in the test set, so we also include the performance
of each model on the subset of data entries associated with users
seen in the training set. Figure 3b and 3c show how each method
performs on the whole test set and the subset respectively with
diferent λu .</p>
      <p>Figure 3b and 3c show that for LSCEM, as λu becomes larger,
performance goes down. This indicates that when short-term
contexts are used, users’ embeddings act like noise and drag down the
re-ranking performance. λu has diferent impacts on the models
not using clicks. For LCRM3, when we zoom in to only focus on
users that appear in the training set, the performance changes and
the superiority over QL are more noticeable. The best value of MAP
is achieved when λu = 0.8, which means long-term context benefit
word-based models with additional information, which can be
helpful for solving the word mismatch problem. In contrast, for LCEM,
with non-zero λu , it performs worse than only considering queries.
Embedding models already capture semantic similarities between
words. In addition, as we mentioned in Section 5.1, they also carry
information about popularity since the products purchased more
often under the query will get more credits during training.
Another possible reason is that the number of customers with sessions
of similar intent is low so that the user embedding is misguiding
the query sessions. Thus, users’ long-term interests do not bring
additional information to further improve LCEM on the collections.</p>
      <p>
        This finding is diferent from the observation in HEM proposed
by Ai et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which incorporates user embeddings as users’
longterm preferences and achieves superior performance compared
to not using user embeddings. We hypothesize that this
inconsistent finding is due to the diferences in datasets. HEM was
experimented on a dataset that is heavily biased to users with multiple
purchases and under a rather simplistic assumption of query
generation, where the terms from the category hierarchy of a product
are concatenated as the query string. Their datasets contain only
hundreds of unique queries and tens of thousands items that are all
purchased by multiple users. In contrast, we experimented on the
real queries and corresponding user behavior data extracted from
search log. The number of unique queries and items in our
experiments are hundreds times larger than in their dataset. There is also
little overlap of users in the training and test set in our datasets,
while in their experiments, all the users in the test set are shown in
the training set.
5.4
      </p>
    </sec>
    <sec id="sec-19">
      <title>Efect of Embedding Size</title>
      <p>Figure 3d shows the sensitivity of each model in terms of embedding
size on Toys &amp; Games, which presents similar trends to the other
two datasets. Generally, SCEM and QEM are not sensitive to the
embedding size as long as it is in a reasonable range. To keep the
model efective and simple, we use 100 as the embedding size and
report experimental results under this setting in Table 2 and the
other figures.
6</p>
    </sec>
    <sec id="sec-20">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>We reformulate product search as a dynamic ranking problem where
leverage users’ implicit feedback on the presented products as
shortterm context and refine the ranking of remaining items when the
users request the next result pages. We then propose an
end-toend context-aware neural embedding model to represent various
context dependency assumptions for predicting purchased items.
Our experimental results indicate that incorporating short-term
context is more efective than using long-term context or not using
context at all. It is also shown that our neural context-aware model
performs better than the state-of-art word-based feedback models.</p>
      <p>For future work, there are several research directions. First, it
would be better to evaluate our short-term context re-ranking model
online, in an interactive setting as each result page can be re-ranked
dynamically. Second, other information sources such as images
and price can be included to extract user preferences from their
feedback. Third, we are interested in the use of negative feedback
such as “skips” that can be identified reliably based on subsequent
user actions.</p>
    </sec>
    <sec id="sec-21">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work was supported in part by the Center for Intelligent
Information Retrieval and in part by NSF IIS-1715095. Any opinions,
ifndings and conclusions or recommendations expressed in this
material are those of the authors and do not necessarily reflect
those of the sponsor.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Qingyao</given-names>
            <surname>Ai</surname>
          </string-name>
          , Keping Bi, Jiafeng Guo, and
          <string-name>
            <given-names>W Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Learning a Deep Listwise Context Model for Ranking Refinement</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>05936</volume>
          (
          <year>2018</year>
          ),
          <fpage>135</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Qingyao</given-names>
            <surname>Ai</surname>
          </string-name>
          , Keping Bi, Cheng Luo, Jiafeng Guo, and
          <string-name>
            <given-names>W Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Unbiased Learning to Rank with Unbiased Propensity Estimation</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .
          <volume>05938</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Qingyao</given-names>
            <surname>Ai</surname>
          </string-name>
          , Yongfeng Zhang, Keping Bi, Xu Chen, and
          <string-name>
            <given-names>W Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Learning a hierarchical embedding model for personalized product search</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference. ACM</source>
          ,
          <volume>645</volume>
          -
          <fpage>654</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Olivier</given-names>
            <surname>Chapelle</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ya</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>A dynamic bayesian network click model for web search ranking</article-title>
          .
          <source>In Proceedings of the 18th international conference on World wide web. ACM</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          , Onno Zoeter,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Taylor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bill</given-names>
            <surname>Ramsey</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>An experimental comparison of click position-bias models</article-title>
          .
          <source>In Proceedings of the 2008 WSDM Conference. ACM</source>
          ,
          <volume>87</volume>
          -
          <fpage>94</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Wei</given-names>
            <surname>Di</surname>
          </string-name>
          , Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu, and
          <string-name>
            <given-names>Elizabeth</given-names>
            <surname>Churchill</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>When relevance is not enough: Promoting visual attractiveness for fashion e-commerce</article-title>
          .
          <source>arXiv preprint arXiv:1406.3561</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Huizhong</given-names>
            <surname>Duan</surname>
          </string-name>
          , ChengXiang Zhai, Jinxing Cheng, and
          <string-name>
            <given-names>Abhishek</given-names>
            <surname>Gattani</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A probabilistic mixture model for mining and analyzing product search log</article-title>
          .
          <source>In Proceedings of the 22nd ACM international conference on Information &amp; Knowledge Management. ACM</source>
          ,
          <volume>2179</volume>
          -
          <fpage>2188</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Huizhong</given-names>
            <surname>Duan</surname>
          </string-name>
          , ChengXiang Zhai, Jinxing Cheng, and
          <string-name>
            <given-names>Abhishek</given-names>
            <surname>Gattani</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Supporting keyword search in product database: a probabilistic approach</article-title>
          .
          <source>Proceedings of the VLDB Endowment 6</source>
          ,
          <issue>14</issue>
          (
          <year>2013</year>
          ),
          <fpage>1786</fpage>
          -
          <lpage>1797</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Georges</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Dupret</surname>
            and
            <given-names>Benjamin</given-names>
          </string-name>
          <string-name>
            <surname>Piwowarski</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A user browsing model to predict search engine click data from past observations.</article-title>
          .
          <source>In Proceedings of the 31st annual international ACM SIGIR conference. ACM</source>
          ,
          <volume>331</volume>
          -
          <fpage>338</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Krista</given-names>
            <surname>Garcia</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>More Product Searches Start on Amazon</article-title>
          . https://retail.emarketer.com/article/more
          <article-title>-product-searches-start-on-amazon/ 5b92c0e0ebd40005bc4dc7ae</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yangyang</surname>
            <given-names>Guo</given-names>
          </string-name>
          , Zhiyong Cheng, Liqiang Nie,
          <string-name>
            <surname>Xin-Shun Xu</surname>
            ,
            <given-names>and Mohan</given-names>
          </string-name>
          <string-name>
            <surname>Kankanhalli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multi-modal preference modeling for product search</article-title>
          .
          <source>In 2018 ACM Multimedia Conference on Multimedia Conference. ACM</source>
          ,
          <year>1865</year>
          -
          <fpage>1873</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Balázs</surname>
            <given-names>Hidasi</given-names>
          </string-name>
          , Alexandros Karatzoglou, Linas Baltrunas, and
          <string-name>
            <given-names>Domonkos</given-names>
            <surname>Tikk</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Session-based recommendations with recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1511.06939</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Balázs</given-names>
            <surname>Hidasi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Domonkos</given-names>
            <surname>Tikk</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>General factorization framework for context-aware recommendations</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>30</volume>
          ,
          <issue>2</issue>
          (
          <year>2016</year>
          ),
          <fpage>342</fpage>
          -
          <lpage>371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Yujing</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Qing Da,
          <string-name>
            <surname>Anxiang</surname>
            <given-names>Zeng</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang Yu</surname>
            , and
            <given-names>Yinghui</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application</article-title>
          . arXiv preprint arXiv:
          <year>1803</year>
          .
          <volume>00710</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bernard J Jansen and Paulo R Molina</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>The efectiveness of Web search engines for retrieving relevant ecommerce links</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>42</volume>
          ,
          <issue>4</issue>
          (
          <year>2006</year>
          ),
          <fpage>1075</fpage>
          -
          <lpage>1098</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Gawesh</surname>
            <given-names>Jawaheer</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Weller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Patty</given-names>
            <surname>Kostkova</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Modeling user preferences in recommender systems: A classification framework for explicit and implicit user feedback</article-title>
          .
          <source>ACM Transactions on Interactive Intelligent Systems (TiiS) 4</source>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Thorsten</surname>
            <given-names>Joachims</given-names>
          </string-name>
          , Laura Granka, Bing Pan, Helene Hembrooke, and
          <string-name>
            <given-names>Geri</given-names>
            <surname>Gay</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Accurately interpreting clickthrough data as implicit feedback</article-title>
          .
          <source>In ACM SIGIR Forum</source>
          , Vol.
          <volume>51</volume>
          . Acm,
          <volume>4</volume>
          -
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Thorsten</surname>
            <given-names>Joachims</given-names>
          </string-name>
          , Adith Swaminathan, and
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Schnabel</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Unbiased learning-to-rank with biased feedback</article-title>
          .
          <source>In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM</source>
          ,
          <volume>781</volume>
          -
          <fpage>789</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Shubhra</given-names>
            <surname>Kanti Karmaker Santu</surname>
          </string-name>
          , Parikshit Sondhi, and
          <string-name>
            <given-names>ChengXiang</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>On application of learning to rank for e-commerce search</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference. ACM</source>
          ,
          <volume>475</volume>
          -
          <fpage>484</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P</given-names>
          </string-name>
          <string-name>
            <surname>Kingma and Jimmy Lei Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: Amethod for stochastic optimization</article-title>
          .
          <source>In Proc. 3rd Int. Conf. Learn. Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Victor</given-names>
            <surname>Lavrenko</surname>
          </string-name>
          and
          <string-name>
            <given-names>W Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Relevance-based language models</article-title>
          .
          <source>In ACM SIGIR Forum</source>
          , Vol.
          <volume>51</volume>
          . ACM,
          <volume>260</volume>
          -
          <fpage>267</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Beibei</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Anindya</given-names>
            <surname>Ghose</surname>
          </string-name>
          , and Panagiotis G Ipeirotis.
          <year>2011</year>
          .
          <article-title>Towards a theory model for product search</article-title>
          .
          <source>In Proceedings of the 20th international conference on World wide web. ACM</source>
          ,
          <volume>327</volume>
          -
          <fpage>336</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Jing</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Pengjie</given-names>
            <surname>Ren</surname>
          </string-name>
          , Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma.
          <year>2017</year>
          .
          <article-title>Neural attentive session-based recommendation</article-title>
          .
          <source>In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM</source>
          ,
          <volume>1419</volume>
          -
          <fpage>1428</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Soon</given-names>
            <surname>Chong Johnson Lim</surname>
          </string-name>
          , Ying Liu, and Wing Bun Lee.
          <year>2010</year>
          .
          <article-title>Multi-facet product information search and retrieval using semantically annotated product family ontology</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>46</volume>
          ,
          <issue>4</issue>
          (
          <year>2010</year>
          ),
          <fpage>479</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Bo</surname>
            <given-names>Long</given-names>
          </string-name>
          , Jiang Bian, Anlei Dong, and
          <string-name>
            <given-names>Yi</given-names>
            <surname>Chang</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Enhancing product search by best-selling prediction in e-commerce</article-title>
          .
          <source>In Proceedings of the 21st ACM CIKM Conference. ACM</source>
          ,
          <volume>2479</volume>
          -
          <fpage>2482</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Yuanhua</given-names>
            <surname>Lv and ChengXiang Zhai</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Adaptive relevance feedback in information retrieval</article-title>
          .
          <source>In Proceedings of the 18th ACM conference on Information and knowledge management. ACM</source>
          ,
          <volume>255</volume>
          -
          <fpage>264</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Nish</given-names>
            <surname>Parikh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Neel</given-names>
            <surname>Sundaresan</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Beyond relevance in marketplace search</article-title>
          .
          <source>In Proceedings of the 20th ACM CIKM Conference. ACM</source>
          ,
          <volume>2109</volume>
          -
          <fpage>2112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Jay</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Ponte</surname>
            and
            <given-names>W Bruce</given-names>
          </string-name>
          <string-name>
            <surname>Croft</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>A language modeling approach to information retrieval</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference. ACM</source>
          ,
          <volume>275</volume>
          -
          <fpage>281</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Massimo</surname>
            <given-names>Quadrana</given-names>
          </string-name>
          , Paolo Cremonesi, and
          <string-name>
            <given-names>Dietmar</given-names>
            <surname>Jannach</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>SequenceAware Recommender Systems</article-title>
          .
          <source>ACM Comput. Surv</source>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Massimo</surname>
            <given-names>Quadrana</given-names>
          </string-name>
          , Alexandros Karatzoglou, Balázs Hidasi, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Personalizing session-based recommendations with hierarchical recurrent neural networks</article-title>
          .
          <source>In Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>130</volume>
          -
          <fpage>137</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Navid</surname>
            <given-names>Rekabsaz</given-names>
          </string-name>
          , Mihai Lupu, Allan Hanbury, and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Zuccon</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Generalizing translation models in the probabilistic relevance framework</article-title>
          .
          <source>In Proceedings of the 25th ACM CIKM conference. ACM</source>
          ,
          <volume>711</volume>
          -
          <fpage>720</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Stefen</surname>
            <given-names>Rendle</given-names>
          </string-name>
          , Zeno Gantner, Christoph Freudenthaler, and
          <string-name>
            <surname>Lars</surname>
          </string-name>
          Schmidt-Thieme.
          <year>2011</year>
          .
          <article-title>Fast context-aware recommendations with factorization machines</article-title>
          .
          <source>In Proceedings of the 34th international ACM SIGIR Conference. ACM</source>
          ,
          <volume>635</volume>
          -
          <fpage>644</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Joseph</given-names>
            <surname>John Rocchio</surname>
          </string-name>
          .
          <year>1971</year>
          .
          <article-title>Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing (</article-title>
          <year>1971</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Khalid</given-names>
            <surname>Saleh</surname>
          </string-name>
          .
          <year>2018</year>
          . Global Online Retail Spending - Statistics and Trends. https:// www.invespcro.com/blog/global-online
          <article-title>-retail-spending-statistics-and-trends/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Gerard</surname>
            <given-names>Salton</given-names>
          </string-name>
          , Anita Wong, and
          <string-name>
            <surname>Chung-Shu Yang</surname>
          </string-name>
          .
          <year>1975</year>
          .
          <article-title>A vector space model for automatic indexing</article-title>
          .
          <source>Commun. ACM</source>
          <volume>18</volume>
          ,
          <issue>11</issue>
          (
          <year>1975</year>
          ),
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Taylor</surname>
          </string-name>
          , John Guiver, Stephen Robertson, and Tom Minka.
          <year>2008</year>
          .
          <article-title>Softrank: optimizing non-smooth rank metrics</article-title>
          .
          <source>In Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM</source>
          ,
          <volume>77</volume>
          -
          <fpage>86</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Bartłomiej</given-names>
            <surname>Twardowski</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Modelling contextual information in session-aware recommender systems with neural networks</article-title>
          .
          <source>In Proceedings of the 10th ACM Conference on Recommender Systems. ACM</source>
          ,
          <volume>273</volume>
          -
          <fpage>276</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Christophe</surname>
            <given-names>Van Gysel</given-names>
          </string-name>
          , Maarten de Rijke, and
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Learning latent vector spaces for product search</article-title>
          .
          <source>In Proceedings of the 25th ACM CIKM Conference. ACM</source>
          ,
          <volume>165</volume>
          -
          <fpage>174</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Damir</surname>
            <given-names>Vandic</given-names>
          </string-name>
          , Flavius Frasincar, and
          <string-name>
            <given-names>Uzay</given-names>
            <surname>Kaymak</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Facet selection algorithms for web product search</article-title>
          .
          <source>In Proceedings of the 22nd ACM CIKM Conference. ACM</source>
          ,
          <volume>2327</volume>
          -
          <fpage>2332</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <surname>Xuanhui</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nadav Golbandi</surname>
            , Michael Bendersky, Donald Metzler, and
            <given-names>Marc</given-names>
          </string-name>
          <string-name>
            <surname>Najork</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Position bias estimation for unbiased learning to rank in personal search</article-title>
          .
          <source>In Proceedings of the Eleventh ACM WSDM Conference. ACM</source>
          ,
          <volume>610</volume>
          -
          <fpage>618</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>Chen</given-names>
            <surname>Wu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ming</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Session-aware information embedding for ecommerce product recommendation</article-title>
          .
          <source>In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM</source>
          ,
          <volume>2379</volume>
          -
          <fpage>2382</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Liang</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Diane Hu, Liangjie Hong, and Huan Liu.
          <year>2018</year>
          .
          <article-title>Turning Clicks into Purchases: Revenue Optimization for Product Search in E-Commerce</article-title>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Fen</surname>
            <given-names>Xia</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tie-Yan</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Jue Wang, Wensheng Zhang, and
          <string-name>
            <given-names>Hang</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Listwise approach to learning to rank: theory and algorithm</article-title>
          .
          <source>In Proceedings of the 25th international conference on Machine learning. ACM</source>
          ,
          <volume>1192</volume>
          -
          <fpage>1199</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>Jun</surname>
            <given-names>Yu</given-names>
          </string-name>
          , Sunil Mohan, Duangmanee Pew Putthividhya, and
          <string-name>
            <surname>Weng-Keen Wong</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Latent dirichlet allocation based diversified retrieval for e-commerce search</article-title>
          .
          <source>In Proceedings of the 7th ACM WSDM Conference. ACM</source>
          ,
          <volume>463</volume>
          -
          <fpage>472</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>Yisong</given-names>
            <surname>Yue</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Interactively optimizing information retrieval systems as a dueling bandits problem</article-title>
          .
          <source>In Proceedings of the 26th Annual International Conference on Machine Learning. ACM</source>
          ,
          <volume>1201</volume>
          -
          <fpage>1208</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>Hamed</given-names>
            <surname>Zamani</surname>
          </string-name>
          and
          <string-name>
            <given-names>W Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Embedding-based query language models</article-title>
          .
          <source>In Proceedings of the 2016 ACM ICTIR conference. ACM</source>
          ,
          <volume>147</volume>
          -
          <fpage>156</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>