=Paper= {{Paper |id=Vol-2319/paper25 |storemode=property |title=Towards Optimization of e-Commerce Search and Discovery |pdfUrl=https://ceur-ws.org/Vol-2319/paper25.pdf |volume=Vol-2319 |authors=Anjan Goswami,Chengxiang Zhai,Prasant Mohapatra |dblpUrl=https://dblp.org/rec/conf/sigir/GoswamiZM18 }} ==Towards Optimization of e-Commerce Search and Discovery== https://ceur-ws.org/Vol-2319/paper25.pdf

Towards Optimization of E-Commerce Search and
Discovery

Anjan Goswami ChengXiang Zhai Prasant Mohapatra
University of California, Davis University of Illinois University of California, Davis
agoswami@ucdavis.edu Urbana-Champaign, IL pmohapatra@ucdavis.edu
czhai@illinois.edu

ABSTRACT Keywords
E-Commerce (E-Com) search is an emerging problem with e-com search, retrieval models, learning to rank, discover-
multiple new challenges. One of the primary challenges ability, exploration-exploitation
constitutes optimizing multiple objectives involving business
metrics such as sales and revenue and maintaining a discov- 1. INTRODUCTION
ery strategy for the site. In this paper, we formalize the
One of the most critical components of an e-commerce (e-
e-com search problem for optimizing metrics based on sales,
com) marketplace is its search functionality. The goal of an
revenue, and relevance. We define a notion of item discover-
e-commerce search engine is to help the buyers to find and
ability in search and show that learning to rank (LTR) algo-
purchase their desirable products. The buyers’ intent repre-
rithms trained with behavioral features from e-com customer
sents the demand side and the products listed by the sellers
interactions (eg. clicks,cart-adds, orders etc.) do not by
represent the supply side of an e-com platform. The e-com
themselves address the discoverability problem. Instead, a
search engine’s function can be considered to provide an ef-
suitable explore-exploit framework must be integrated with
ficient matching platform for the supply and the demand
the ranking algorithm. We thus construct a practical dis-
side and to facilitate transactions to generate revenue. Like
covery strategy by keeping a few top positions for discovery
a web search engine, an e-com search also intends to ensure
and populating some of the items selected through explo-
user satisfaction by presenting the most relevant items that
ration. Then, we present a few exploration strategies with
a user is searching for. However, an e-com search engine
low regret bounds in terms of business metrics. We conduct
generates revenue directly based on the products presented
a simulation study with a synthetically generated dataset
to the users; this is in sharp contrast with how a web search
that represents items with different utility distribution and
engine generates revenue through advertising and raises sig-
compares these strategies using metrics based on sales, rev-
nificant new challenges for designing effective search algo-
enue, relevance, and discovery. We find that a strategy based
rithms. Thus besides the usual goal of maximizing relevance
on adaptive submodular function based discovery framework
of the search results, an e-com search engine also needs to
can provide a nice balance of business metrics and discov-
optimize two additional goals that generally do not exist
erability compared to other strategies based on random ex-
in web search scenarios, i.e., it has to 1) maximize the busi-
ploration or multi-armed bandit. However, another strategy,
ness metrics such as revenue or sales from the purchases and
based on monotonic submodular optimization function that
2) minimize the inventory cost by selling the items faster.
needs to be integrated with linear LTR models also works
Both the objectives are dependent on the performance of
well for discovery and has nice performances with respect to
the underlying ranking algorithm of the search engine other
sales, revenue, and relevance.
than the assortment selection and demand generation. The
first set of goals of maximizing the business metrics can be
Categories and Subject Descriptors achieved by constructing a rank function using learning to
rank framework with a suitable optimization objective that
H.3.3 [Information Search and Retrieval]: E-Com Search is in line with the business goals of the e-commerce com-
pany. The second goal however requires the e-commerce
sites to have a strategy of discovery so that it can expose
General Terms as many items to the customers faster and determine what
Algorithms products are selling. This allows the companies to readjust
their assortment strategy to optimize the inventory and as-
sociated costs. The discovery strategies however, can hurt
the business goals since it requires showing previously unex-
plored items to the customers. In e-commerce, learning to
Permission to make digital or hard copies of all or part of this work for rank functions are trained by constructing mainly two types
personal or classroom use is granted without fee provided that copies are of features: (a) features that arise from static contents of
not made or distributed for profit or commercial advantage and that copies the product description (b) features that come from the be-
bear this©notice
Copyright 2018 by and the full
the paper’s citation
authors. onpermitted
Copying the firstforpage.
privateTo
andcopy otherwise,
academic purposes. to havioral signals such as clicks, cart-adds, sales, and review
In: J. Degenhardt,
republish, G. Di
to post onFabbrizio,
servers or S. Kallumadi, M. Kumar,
to redistribute Y.-C. requires
to lists, Lin, A. Trotman, H. Zhao
prior specific
(eds.): Proceedings of the SIGIR 2018 eCom workshop, 12 July, 2018, Ann Arbor, Michigan, USA,
permission and/or a fee. ratings. It has been observed [16, 21] that the second group
published at http://ceur-ws.org
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. of features is more useful for finding a comparative relevance
among the items. However, the use of the machine learning literatures. The algorithms that are commonly used for ex-
algorithms in search engines tends to bias a search engine ploration include -greedy [23], and variants of upper con-
toward favoring the viewed items by users due to the use of fidence bound (UCB) algorithm [2]. Additionally, k-armed
features that are computed based on user behavior such as dueling bandit framework [25] has been developed for find-
clicks, rates of “add to cart”, etc. Since a learning algorithm ing out optimal bandit in a scenario where each bandit can
would rank items that have already attracted many clicks be considered to optimize a specific objective. It is possible
on the top, it might overfit to promote the items viewed by to frame a multiobjective problem with exploration compo-
users. As a result, some items might never have an oppor- nent using such paradigm. In this paper, we aim to construct
tunity to be shown to a user (i.e., “discovered” by a user), a framework that is simpler than K-armed bandit but more
thus also losing the opportunity to potentially gain clicks. practical.
Such “undiscovered” products would then have to stay in the
inventory for a long time incurring extra cost and hurting
satisfaction of the product providers. Hence, it is important 3. OPTIMIZATION OF E-COM SEARCH
to consider the integrating the aspect of discovery with the We consider the problem of optimization of an E-Com
ranking mechanism for e-commerce search. In this paper, search engine over a period of time {1, · · · , T }. Let us as-
we provide a practical yet theoretically sound framework for sume that there are N queries, denoted by Q = (q1 , q2 , · · · , qN ),
learning to rank where it is possible to have a regulated dis- that can be sent to the search engine during this time. Let
covery mechanism minimizing the loss in revenue, sales, or Z = {ζ1 , · · · , ζM } be the set of M items and let the corre-
relevance metrics. sponding prices of the items be given by {ρ1 , · · · , ρM }. Now,
The key contributions of this paper can be described as consider a rank function π and π(qi , t) outputs top K items
follows: which is a permutation from the set Z. Now, let us consider
(1) This is the first paper that formalizes an e-com search two binary random variables ΛS C
ijt , Λijt where the first one de-
problem as a multi-objective LTR problem with continuous notes if a sale happens or not and the second one denotes if
exploration that we call learning to rank and discover prob- there is a click or not for an item ζj given a query qt at time t.
lem. (2) We discuss how we can construct a solution to this Now, given that we have these two binary random variables,
problem using the existing framework of multi-armed ban- we can define the probability of sales given a query qi , item
dit and auction mechanism. (3) We also provide a novel ζj at time t as p(ΛS ijt = 1|qi , ζj , t) and similarly probability
paradigm combining algorithms from multi-objective opti- of click for the same can be defined as p(ΛC ijt = 1|qi , ζj , t).
mization and explore and exploit paradigms. Let’s also create a normalization constant Γ that denotes
the total number of search sessions during the time period
2. BACKGROUND we are considering to conduct this study.
Now, we can define several objective functions for the e-
2.1 LTR and Ecommerce Search commerce search such as the following:
Conversion rate per visit (CRV):
Learning to rank (LTR) algorithms [19] involve learning a
function for optimizing a relevance measure using an offline t=T i=N j=M
1 XX X
labeled training data set. This function can be a neural net, gCRV = p(ΛS
ijt = 1|qi , ζj , t)
boosted tree [3, 22], support vector machine [15] or linear Γ t=1 i=1 j=1
models [13] etc. The common ranking measures [14] include
normalized discounted cumulative gain (NDCG), mean re- Revenue per visit (RPV):
ciprocal ranking (MRR), mean average precision (MAP) t=T i=N j=M
etc. The loss functions based on these ranking measures 1 XX X
gRP V = p(ΛS
ijt = 1|qi , ζj , t)ρj
are mostly non-smooth, however researchers [5, 17] derive Γ t=1 i=1 j=1
LTR algorithms such as LambdaMart [4] to address such
problems. There is also a work on multi-objective learning We now also define a relevance based objective as follows:
to rank [22] that uses a click rate along with the human t=T i=N
judged ratings to augment the computation of gradient di- 1 XX
gREL (π) = RM (π(qi , t))
rection from pairwise training data in LambdaMart algo- Γ t=1 i=1
rithm. This algorithm can potentially be modified to learn
to maximize multiple graded objectives where the priorities where RM can be any relevance measure such as normal-
are known in advance. There are not too many papers on ized discounted cumulative gain (NDCG), which is gener-
LTR algorithms applied to e-com data. Recently, researchers ally defined based on how well the ranked list π(q) matches
from Walmart [16] have conducted some experiments with the ideal ranking produced based on human annotations of
multiple LTR algorithms with e-com data. In their obser- relevance of each item to the query. The aggregation func-
vation, LambdaMart algorithm with a sales and click rate tion does not have to be a sum (over all the queries); it
based objective worked well for their data set. In a sepa- can also be, e.g., the minimum of relevance measure over all
rate paper, researchers from Amazon [21] reported use of the queries, which would allow us to encode the preference
gradient boosted tree based regression methods [8] for LTR for avoiding any query with a very low relevance measure
application. They also reported using various normalization score. Note that it is possible to define the NDCG based on
techniques to avoid the biases generated due to the heavy ratings created from direct user feedback such as clicks, add-
reliance of historical behavioral data in regression. to-cart, or sales, since comparing items based on relevance
in e-commerce can be hard [21]. In this paper, we assume
2.2 Online learning and Multi-armed bandit that the relevance function uses a rating system without
Multi-armed bandit (MAB) algorithms [11, 20] have a rich any specific notion of how that rating system is constructed.
With this generality, we can define the following objective approach is that we can directly build on the existing LTR
for relevance: framework, though the new objective function would pose
new challenges in designing effective and efficient optimiza-
gREL (π) = mini∈[1,N ],j∈[1,T ] RM (π(qi , t)) tion algorithms to actually compute optimal rankings. One
Now,we define a notion of discovery for an e-com engine. disadvantage of this strategy is that we cannot easily control
To formalize the notion of discoverability, we say that the the tradeoff between different objectives (e.g., we sometimes
LTR function f is β-discoverable if all items are shown at may want to set a lower bound on one objective rather than
least β times. Now, we can further define a β-discoverability to maximize it). To address this limitation, we may allow
rate as the percentage of items that are impressed at least some objectives to be treated as constraints.
β times in a fixed period of time. Let us now define again
a binary variable γi for every item ζi and then assume that
4.2 Incremental Optimization of e-Com Search
γi = 1 if the item got shown in the search results for β times An alternative strategy is to take an existing LTR ranking
and γi = 0 in case the item is not shown in the search results function as a basis and seek to improve the ranking (e.g., by
more than β times. We can express this as follows: perturbation) so as to optimize multiple objectives as de-
Pi=|Z| scribed above; such an incremental optimization strategy is
i=1 γi more tractable as we will be searching for solutions to the op-
gβ−discoverability =
|Z| timization problem in the neighborhood of an existing rank-
ing function. Specifically, we instantiate the optimization
We can further define another objective sales through rate problem discussed in the previous section by assuming that
(STR) to denote the number of items sold in a period of time the engine incrementally perturbs a learning to rank func-
as percentage of the inventory. This objective represent the tion slightly in a way that the original relevance does not
efficiency of selling items faster. Let us define again a binary vary too much and then each function π = (f1 , ..., fT ) at
variable λs for every item ζs and then assume that λs = 1 time step t = 1 to t = T gradually improves upon multiple
if the item got sold and λs = 0 in case the item is not sold. objectives. Intuitively, we start with a function f and gen-
Thus, the STR objective function when using policy π can erate the next function in the “neighborhood” of an existing
be written as follows: ranking function f within a relevance bound and we take
i=|Z|
X each step in a way that it attempts to improve the sales and
gST R (π) = λi revenue with high probability and it also tries to achieve bet-
i=1 ter β-discoverability and STR for the items in its inventory.
Now, we define the objective of an e-com search engine as However, the search space for all such ranking functions is
the following: still intractable and we need a heuristic to obtain a “good”
Pareto optimal solution for the multi objective optimization
maximize gCRV , gRPV , gREL , gβ−discoverability , gSTR problem.
We thus need to construct an optimization framework that
This is a multi-objective optimization and has an explo-
integrates an exploration mechanism with a learning to rank
ration component in the form of discovery objective. The
paradigm.
priorities of these objectives can be business dependent. For
example, a business aiming to maximize the gross revenue 4.3 Discovery Framework
can prioritize maximizing revenue and on the other hand, a One way of constructing a framework that optimizes an
business aiming to improve sales growth may want to maxi- e-com site for multiple business objectives along with an
mize the conversions. Similarly, a business that depends on exploration option can be to use a fixed number of rank
periodically bringing new items in the inventory may want positions say x for exploration and use the rest of the top
to prioritize the discovery objective along with sales and rev- (K − x) positions for exploitation. In this set-up, we can
enue. In most real scenarios, such business priorities come construct an explore set of items E where all the items have
from the strategic directions of an e-commerce platform. been shown less than β times. We then select top x candi-
We now discuss the strategies for solving this optimization dates for a query qi from the intersection of the recall set of
problem. items for the query and the set E. Let’s call this set corre-
sponding to query qi as Ei . The items selected for the rest
4. STRATEGIES FOR SOLVING THE OP- of the (K − x) positions can be based on a suitable multi-
TIMIZATION PROBLEM objective LTR ranking function. This is a framework that
is practical and the cost of the exploration can be estimated
Since there are multiple objectives to optimize, it is impos-
by the loss of revenue for selecting not the most optimal
sible to directly apply an existing Learning to Rank (LTR)
items for x positions. Note, it is also possible to select any
method to optimize all the objectives. However, there are
rank positions to show the explored items. However, in this
multiple ways to extend an LTR method to solve the prob-
paper, let us assume that we keep those x items at top for
lem as we will discuss below.
keeping our framework simple to analyze.
4.1 Direct extension of LTR 4.4 LTR and Discoverability
One simple strategy is to combine the multiple objectives
Now that we have a formal model for the e-commerce
with appropriate weighting parameters so as to form one
search optimization problem, we want to establish the fol-
single objective function, which can then be used in a tra-
lowing point:
ditional LTR framework to find a ranking that would opti-
mize the consolidated objective function. The weights can • A LTR function is not optimal for discoverability cri-
be based on the business priorities. The advantage of this teria.
on sales over impression for the query. If an item ζj is in
the set E ∩ Ri and is shown bj times in T iterations, and is
In an e-com site, only top K items are shown to the
sold aj times inqbetween, then the UCB score of the item ζj
customers. Suppose there is an item ζ that is rele- a
vant to a set of queries Q. Now, assume that for each is ucbj = bjj + 2 logb2jT +1 . Note, this is for a specific query.
query q ∈ Q there are K other relevant old items for We then select x items based on top UCB scores.
which f (X2)) > 0. Assuming that the K items are Explore LTR (eLTR In this, we define a function that
equally relevant, we can say that the ζ has same value we call explore LTR (eLTR) to select the x items. The rest
of f (X1) compared to the k old items. Then, ζ will al- of the items for top K can be chosen using the traditional
ways rank below the top K items for the set of queries LTR. Then, we can either keep the x items on top or we can
in Q. Suppose the current number of impressions for rerank all K items based on eLTR.
ζ is less than β, then it has no chance of receiving im- The main motivation for the eLTR is the observation that
pression under the ranking framework using function f there is inherent overfitting in the regular ranking function
because it will always be ranked below the top K items used in an e-com search engine that hinders exploration,
unless there is new query that ranks this item higher. i.e., hinders improvement of β-discoverability and STR. The
This means that as long as we can find a percentage of overfitting is mainly caused by a subset of features derived
items that receives below β impressions, it is hard to from user interaction data. Such features are very important
improve the β-discoverability using a traditional LTR as they help inferring a user’s preferences, and the overfit-
mechanism without any integrated discovery mecha- ting is actually desirable for many repeated queries and for
nism. items that have sustained interests t users (since they are
“test cases” that have occurred in the training data), but
5. DISCOVERY STRATEGIES it gives old items a biased advantage over the new items,
limiting the increase of β-discoverability and STR. Thus the
In this section, we propose several strategies to select the x main idea behind e-LTR is thus to separate such features
items for discovery that aims to minimize the loss of business and introduce a parameter to restrict their influences on the
metrics. ranking function, indirectly promoting STR. Formally, we
note that in general, a ranking function can be written in
5.1 Auction based strategy the following form:
In this strategy, the items for top x positions can be as-
signed based on an auction strategy where sellers can bid y = f (X) = g(f1 X1), f2 (X2))
for the positions. This strategy has been used in sponsored where y ∈ R denotes a ranking score and X ∈ RN is a
search for search engine companies such as Google and in N dimensional feature vector, X1 ∈ RN 1 and X2 ∈ RN 2
sponsored product for e-commerce company such as Ama- are two different groups of features such that N 1 + N 2 =
zon. Design of such auction mechanism that balances the N, X1 ∪ X2 = X. The two groups of features are meant to
revenue and relevance is a complex subject and has been distinguish features that are unbiased (e.g., content match-
discussed in recent papers by Ghose et al. [10] and Athey at ing features) from those that are inherently biased (e.g.,
al. [1]. However, many e-commerce businesses may not want clickthrough-based features). Here g is an aggregation func-
to use auction strategy to aid discovery since it can hurt the tion which is monotonic with respect to both arguments. It
growth of seller participation on those sites. is easy to show that any linear model can be written as a
monotonic aggregation function. It is not possible to use
5.2 Exploration with LTR (eLTR) such representation for models such as additive trees. How-
This is a strategy discussed in a recent paper by Goswami ever, our previous techniques do not have such limitation
et al. [12]. In this strategy, we define the set from which since they are completely separated from the LTR. In this
the LTR function selects the items as Li ⊂ Ri for a given paper, we keep our discussion limited to linear models. We
query qi . We assume that all the items outside set Li are now define explore LTR (eLTR) function as follows:
not β-discoverable. Then, L = ∪i=N i=1 Li is the set of all β-
y e = fe (X) = g(f (X1), × f (X2))
discoverable items. Hence, the set E = R \ L can then be
consisting of all the items that require exploration. where y e ∈ R and 0 ≤ ≤ 1 is a variable in our algorithmic
Goswami et al. [12] discuss three strategies to incorporate framework. Since, g is monotonic, fe (X) <= f (X) when
discovery in an e-commerce search. ≤ 1. Since feature set X2 is a biased feature set favoring
Random selection based exploration from the re- old items, we can expect ranking based on f e would be more
call set (RSE): This is a baseline strategy for continuous in favor of new items in comparison with the original f ,
exploration with a LTR algorithm. In this, for every query achieving the goal of emphasizing exploration of new items.
qi , we randomly select x items from the set E ∩Ri . Then, we Note that controls the amount of exploration: the smaller
put these x items on top of the other (k − x) items that are is, the more exploration (at the cost of exploitation). Since
selected using LTR from the set Ri . The regret here will be the maximum exploration is achieved when =0, in which
linear with the number of search sessions with exploration. case, ranking is entirely relying on f1 , the only loss in the
Upper confidence bound (UCB) based exploration original objective function is incurred by the removal of f2 .
from the recall set (UCBE): This is another simple strat- By controlling what features to be included in f2 , we can
egy that uses a variant of UCB based algorithm for explo- control the upper bound of the loss. In this sense, eLTR
ration instead of random sampling. Here, we maintain a ensures a “safe” exploration strategy since f1 is always active.
MAB for each query. We consider each item in the set Note, this composed function gradually becomes the LTR
E ∩ Ri as an arm for the MAB corresponding to a query function when tends to 1. There can be various ways of
qi . We maintain an UCB score for each of those items based constructing the such as the following:
eLTR basic exploration (eLTRb): In this strategy, lenges. Here we discuss some ideas for evaluating the pro-
I
we keep = Tmax . Here, I is an iteration and Tmax is a posed e-LTR algorithm, which we hope will stimulate more
maximum number of iteration after which everything can work in this direction.
be reset. This is a very simple strategy where the eLTR just Given a set of queries and an initial ranking function, an
increases the importances of the behavioral features gradu- e-LTR method is expected to “discover” the ideal ranking
ally with every iteration. for each query, over a sequence of iterations. Ideal ranking
eLTR ucb weighted exploration (eLTRu): In this may be defined as one that maximizes business metrics like
ucb
strategy, we keep = Ujj . Here, Uj is a normalization sales, revenue etc. Since exploration is involved, one can
factor and in our experiment it is chosen to be the maximum expect the quality of ranking (N DCG) and consequently
UCB score in the set E ∩ Ri . sales/revenue to fluctuate for individual queries during the
eLTR ucb weighted exploration and reranking (eL- discovery process. However, the benefit of an e-LTR method
TRur): This strategy first selects the top x items using lies in its ability to deliver greater aggregate sales/revenue
eLTRu and it selects the remaining (k − x) items using the over some reasonable number of iterations N , compared to
classic LTR and then it reranks the k items using eLTRu. a non-discovery LTR method. We must thus compare our
Goswami et al. [12] show in their paper that eLTR can methods on the iterative growth in quality of ranking, and
be considered as a monotonic sub-modular function and can aggregate gain in business metrics.
thus be approximated by a greedy algorithm. The ideal approach for conducting such an evaluation would
require simultaneously deploying all candidate methods to
5.3 Adaptive submodular function based eLTR live user traffic, and computing various user engagement
(AeLTR) metrics such as click through rate,sales, revenue etc. How-
ever this strategy is difficult to implement in practice. Since
This is an extension of the eLTR mechanism presented by
user traffic received by each candidate method is different,
Goswami et al. [12]. However, it makes the strategy much
we need to direct substantial amount of traffic to each method
more general by allowing us to use any LTR rank function
to make observations comparable and conclusions statisti-
going beyond the limitation of linear LTR models. In this,
cally significant. Deployment of a large number of experi-
we use the following function:
mental and likely sub-optimal ranking functions, especially
y a = fa (X) = a f (X) when evaluating baselines, can result in significant business
losses for e-Commerce search engines. More importantly,
Here, the a is a function of UCB as follows: such an approach of using live user traffic would not allow
ucbj us to have a fair comparison of any future algorithm with
a = the ones that we test today, so it does not facilitate further
Uj
study of the problem.
This algorithm has been discussed in a paper by Gabillon Perhaps a good and feasible strategy is to design a simulation-
et al. [9] and is proven to be also a monotonic sub-modular based counterfactual evaluation method [18]. Here, unlike a
function. Hence, a simple greedy algorithm can give a nice traditional static test collection, we also have to introduce
approximation also for this. We however do not know how models to simulate user behavior and model the biases ob-
this strategy can be compared in terms of losses and the served in real system. This can be a topic itself and we omit
discovery metric with respect to the other approaches that further discussion on this as it is out of scope for this short
we have discussed in this paper. paper.
5.4 Regret of the eLTR strategies
The RSE function has linear regret since it is random. The
7. EXPERIMENTAL RESULTS
regret for this can be arbitrarily bad and can thus be given In this section, we first construct a synthetic historical
by O(Rl ). The regret for the UCBE algorithm can be given dataset with queries, items and their prices. We also gener-
by O(log (Rl )). Both eLTR and AeLTR have similar regret ate the true purchase probabilities and utility scores for the
because they both uses monotonic sub-modularity property. item and query pairs. Additionally, we use a specific rank
These algorithms pull items using a score that is a func- function to simulate the behavior of a trained LTR model.
tion of the LTR score and are expected to provide a better Then we conduct a simulation as described in section ??
loss compared to the approaches that are based on multi- with various exploration strategies. During the simulation
armed bandit algorithms. However, it is not immediately we use the observed purchase probabilities estimated from
clear how these algorithms can optimize the discoverability the purchase feedback as the most important feature for the
objective. The regret of the eLTR group of algorithms can rank function but we use the true probabilities generated
|E| during the initial data generation phase to simulate the user
be estimated by O(1+1/e− x ) times worst compared to the
behavior.
optimal which comes from their monotonic sub-modularity
The main goal of this experimental study is to evaluate
property. The eLTR algorithms can be considered to be sim-
the behavior of the exploration strategies with different dis-
ilar to the -greedy approaches where the regret bound is not
tributions of the utility scores representing different state of
better than the UCB algorithm. However, in our simulation,
the inventory in an e-com company.
we want to study the discoverability metrics for such eLTR
We evaluate our algorithms by running the simulation for
algorithms.
T times. We compute RPV and β-discoverability at the end
of T iterations. We also compute a purchase based mean re-
6. EVALUATION METHODOLOGY ciprocal ranking [6] metric (MRR). This metric is computed
Due to the involvement of multiple objectives, the eval- by summing the reciprocal ranks of all the items that are
uation of E-Com search algorithms also presents new chal- purchased in various user visits for all queries. Moreover,
we also discretize our gold utility score between 1 to 5 and Algorithms RPV NDCG MRR β−d
generate a rating for each item. This also allows us to com- LTR 0.12 0.96 0.54 0.10
pute a mean NDCG score at k-th position for all the search RSE 0.06 0.76 0.21 0.78
sessions as a relevance metric. UCBE 0.08 0.87 0.28 0.66
We expect to see that the RPV and NDCG of the LTR eLTRb 0.09 0.94 0.33 0.67
function will be the best. however the β-discoverability val- eLTRu 0.09 0.94 0.33 0.67
ues will be better in ranking policies that use an exploration eLTRur 0.09 0.94 0.33 0.67
strategy. The new ranking strategies will incur loss in RPV AeLTR 0.10 0.95 0.40 0.51
and NDCG and based on our theoretical analysis we ex-
pect the eLTR methods to have less loss compared to the
RSE and UCB based approaches in those measures. We Table 1: Simulation of eLTR framework, with |Q| = 100, |Z| =
also expect to see a loss in MRR for all exploration meth- 5000, |L| = 200, β − d = 10%, β = 50, K = 6, x = 3, T = 50000.
ods. However, we mainly interested in observing how these
algorithms perform in β-discoverability metric compared to
LTR. through the search results one after another from the top
and can purchase an item based on that item’s purchase
probability for that query. Note, in order to keep the simu-
7.1 Synthetic data generation lation simple, we consider an user only purchases one item
We first generate a set of N queries and M items. We in one visit and leaves the site after that. We also can ap-
then assign the prices of the items by randomly drawing a ply a discount to the probability of purchase logarithmically
number between a minimum and a maximum price from a 1
for each lower rank by multiplying log (r+1) , where r is the
2
multi-modal Gaussian distribution that can have up to 1 to ranking position of the item. This is to create an effect of
10 peaks for a query. We select the specific number of peaks the position bias [7].
for a query uniform randomly. We also assign a subset of
the peak prices to a query to be considered as the set of it’s 7.2 Description of the experimental study
preferred prices. This makes a situation where every query
We conduct two sets of experiments with this simulated
may have a few preferred price peak points where it may
data.
also have the sales or revenue peaks.
In the first set of experiments we use a Gaussian distri-
Now that we have the items and queries defined, we ran-
bution with mean 0.5 and the variance 0.1 for generating
domly generate an utility score, denoted by (uij ) for every
the utility scores, but everything else is same as the previ-
item ζj for a query qi . In our set up, we use uniform ran-
ous experiment. This means that the utility of the items
dom, Gaussian and a long tailed distribution for selecting
in the inventory follows a normal distribution. We generate
the utilities. These three different distributions represent
the scores for all the products from a Gaussian distribution
three scenarios for a typical e-com company’s inventory. Ad-
with mean 0.5 and the variance 0.1. The table 1 shows the
ditionally, we generate a purchase probability between 0.02
tables with final metrics for all the algorithms. We observe
to 0.40 for every item for every query. We generate these
that with a Gaussian distribution of utility scores the eLTR
probabilities such that they correlates with the utility score.
approaches have better MRR, and β-discoverability. We ob-
We generate these numbers in a way so that these are cor-
serve that the AeLTR approach works well by producing
related with the utility scores with a statistically significant
good MRR and yet the β-discoverability is better than LTR.
(p-value less than 0.10) Pearson correlation coefficient [24].
This is a powerful result since it shows that it is possible to
We also intend to correlate the purchase probability with
integrate this with more sophisticated LTR algorithms
the preferred peak prices for a query. Hence, we give an
In the second set of experiment, we use a power law to
additive boost between 0 to 0.1 to the purchase probability
generate the utility distribution. This means that only a
in proportion to the absolute difference of the price of the
small set of items here can be considered valuable in this
item from the closest preferred mean price for that query.
scenario. The table 2 shows the final metrics for this case.
By generating the purchase probabilities in this way, we en-
We notice that even with this distribution of utility scores
sure that the actual purchase probabilities are related to the
the eLTR variants have smaller loss in RPV, NDCG, and
preferred prices for the queries, and also it is related to the
in MRR. Note that in this distribution, the discoverability
utility scores of the items for a given query. Now, we de-
can be considered to be naturally not so useful since a large
fine a β-discoverability rate β = b and selects b × M items
number of items are not that valuable. We expect in such
randomly from the set of all items. In our simulation, we
situation, a nice discoverability algorithm can help to elimi-
assume that the estimated (observed) purchase probability
nate items that do not get sold after sufficient exposure and
for all the items in set E at the beginning can be zero. The
enable the e-com company to optimize it’s inventory. We
rest of the items purchase probability are assumed to be
observe a very similar trend for AeLTR algorithm as our
estimated correctly at the beginning. Now, we create a sim-
previous experiment. It gives better RPV and MRR close
ple rank function that is a weighted linear function of the
to the LTR algorithm but it has less β-discoverability com-
utility score (u), the observed purchase probability (po ), and
pared to eLTR approaches.
the normalized absolute difference between the product price
and the closest preferred mean price (m̂p for the query such
that l = 0.60po + 0.20u + 0.20m̂p . Here l denotes the score 8. CONCLUSIONS AND FUTURE WORK
of the ranker. This ranker simulates a trained LTR function This paper represents a first step toward formalizing the
in e-com search where usually the sales can be considered emerging new E-Com search problem as an optimization
the most valuable behavioral signal. problem with multiple objectives including the sales(CRV),
We now construct an user model. Here, an user browses revenue (RPV), and discoverability besides relevance. We
Algorithms RPV NDCG MRR β−d [10] A. Ghose and S. Yang. An empirical analysis of search
LTR 9.56 0.57 0.45 0.11 engine advertising: Sponsored search in electronic
RSE 7.55 0.27 0.25 0.76 markets. Management Science, 55(10):1605–1622,
UCBE 7.55 0.27 0.26 0.66 2009.
eLTRb 8.2 0.33 0.31 0.67 [11] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed
eLTRu 8.3 0.33 0.31 0.67 bandit allocation indices. John Wiley & Sons, 2011.
eLTRur 8.4 0.33 0.32 0.67 [12] A. Goswami, C. Zhai, and P. Mohapatra. Learning to
AeLTR 9.4 0.41 0.40 0.40 rank and discover for e-commerce search. In
International Conference on Machine Learning and
Data Mining in Pattern Recognition.
Table 2: Simulation of eLTR framework, with |Q| = 100, |Z| = [13] L. Hang. A short introduction to learning to rank.
5000, |L| = 200, β − d = 10%, β = 50, K = 6, x = 3, T = 50000.
IEICE TRANSACTIONS on Information and
Systems, 94(10):1854–1862, 2011.
formally define these objectives and discuss multiple strate- [14] K. Järvelin and J. Kekäläinen. Ir evaluation methods
gies for conducting exploration with the learning to rank for retrieving highly relevant documents. In
algorithms. We hope that our work will open up many new Proceedings of the 23rd annual international ACM
directions in research for optimizing e-com search. The obvi- SIGIR conference on Research and development in
ous next step is to empirically validate the strategies based information retrieval, pages 41–48. ACM, 2000.
on log data from an e-com search engine. The theoreti- [15] T. Joachims. Optimizing search engines using
cal framework also enables many interesting ways to further clickthrough data. In Proceedings of the eighth ACM
formalize the e-com search problem and develop new effec- SIGKDD international conference on Knowledge
tive e-com search algorithms. Finally, the proposed discov- discovery and data mining, pages 133–142. ACM,
ery framework is just a small step toward solving the new 2002.
problem of optimizing discoverability in e-com search; it is [16] S. K. Karmaker Santu, P. Sondhi, and C. Zhai. On
important to further develop more effective algorithms by application of learning to rank for e-commerce search.
using these algorithms as baselines. In Proceedings of the 40th International ACM SIGIR
Conference on Research and Development in
Information Retrieval, SIGIR ’17, pages 475–484, New
9. REFERENCES York, NY, USA, 2017. ACM.
[1] S. Athey and D. Nekipelov. A structural model of [17] H. Li. A short introduction to learning to rank. IEICE
sponsored search advertising auctions. In Sixth ad TRANSACTIONS on Information and Systems,
auctions workshop, volume 15, 2010. 94(10):1854–1862, 2011.
[2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time [18] L. Li, S. Chen, J. Kleban, and A. Gupta.
analysis of the multiarmed bandit problem. Machine Counterfactual estimation and optimization of click
learning, 47(2-3):235–256, 2002. metrics in search engines: A case study. In Proceedings
[3] C. Burges, T. Shaked, E. Renshaw, A. Lazier, of the 24th International Conference on World Wide
M. Deeds, N. Hamilton, and G. Hullender. Learning to Web, pages 929–934. ACM, 2015.
rank using gradient descent. In Proceedings of the [19] T.-Y. Liu. Learning to rank for information retrieval.
22nd international conference on Machine learning, Foundations and Trends in Information Retrieval,
pages 89–96. ACM, 2005. 3(3):225–331, 2009.
[4] C. J. Burges. From ranknet to lambdarank to [20] A. Mahajan and D. Teneketzis. Multi-armed bandit
lambdamart: An overview. Learning, 11(23-581):81, problems. In Foundations and Applications of Sensor
2010. Management, pages 121–151. Springer, 2008.
[5] C. J. Burges, R. Ragno, and Q. V. Le. Learning to [21] D. Sorokina and E. Cantu-Paz. Amazon search: The
rank with nonsmooth cost functions. In Advances in joy of ranking products. In Proceedings of the 39th
neural information processing systems, pages 193–200, International ACM SIGIR Conference on Research
2007. and Development in Information Retrieval, SIGIR ’16,
[6] N. Craswell. Mean reciprocal rank. In Encyclopedia of pages 459–460, New York, NY, USA, 2016. ACM.
Database Systems, pages 1703–1703. Springer, 2009. [22] K. M. Svore, M. N. Volkovs, and C. J. Burges.
[7] N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An Learning to rank with multiple objective functions. In
experimental comparison of click position-bias models. Proceedings of the 20th international conference on
In Proceedings of the 2008 International Conference World wide web, pages 367–376. ACM, 2011.
on Web Search and Data Mining, WSDM ’08, pages [23] J. Vermorel and M. Mohri. Multi-armed bandit
87–94. ACM, 2008. algorithms and empirical evaluation. In European
[8] J. H. Friedman. Greedy function approximation: a conference on machine learning, pages 437–448.
gradient boosting machine. Annals of statistics, pages Springer, 2005.
1189–1232, 2001. [24] R. R. Wilcox. Introduction to robust estimation and
[9] V. Gabillon, B. Kveton, Z. Wen, B. Eriksson, and hypothesis testing. Academic press, 2011.
S. Muthukrishnan. Adaptive submodular [25] Y. Yue, J. Broder, R. Kleinberg, and T. Joachims.
maximization in bandit setting. In Advances in Neural The k-armed dueling bandits problem. Journal of
Information Processing Systems, pages 2697–2705, Computer and System Sciences, 78(5):1538–1556,
2013. 2012.