Towards Optimization of E-Commerce Search and
                                    Discovery

                           Anjan Goswami                                              ChengXiang Zhai                        Prasant Mohapatra
                 University of California, Davis                                     University of Illinois               University of California, Davis
                 agoswami@ucdavis.edu                                               Urbana-Champaign, IL                 pmohapatra@ucdavis.edu
                                                                                     czhai@illinois.edu


ABSTRACT                                                                                                Keywords
E-Commerce (E-Com) search is an emerging problem with                                                   e-com search, retrieval models, learning to rank, discover-
multiple new challenges. One of the primary challenges                                                  ability, exploration-exploitation
constitutes optimizing multiple objectives involving business
metrics such as sales and revenue and maintaining a discov-                                             1.   INTRODUCTION
ery strategy for the site. In this paper, we formalize the
                                                                                                           One of the most critical components of an e-commerce (e-
e-com search problem for optimizing metrics based on sales,
                                                                                                        com) marketplace is its search functionality. The goal of an
revenue, and relevance. We define a notion of item discover-
                                                                                                        e-commerce search engine is to help the buyers to find and
ability in search and show that learning to rank (LTR) algo-
                                                                                                        purchase their desirable products. The buyers’ intent repre-
rithms trained with behavioral features from e-com customer
                                                                                                        sents the demand side and the products listed by the sellers
interactions (eg. clicks,cart-adds, orders etc.) do not by
                                                                                                        represent the supply side of an e-com platform. The e-com
themselves address the discoverability problem. Instead, a
                                                                                                        search engine’s function can be considered to provide an ef-
suitable explore-exploit framework must be integrated with
                                                                                                        ficient matching platform for the supply and the demand
the ranking algorithm. We thus construct a practical dis-
                                                                                                        side and to facilitate transactions to generate revenue. Like
covery strategy by keeping a few top positions for discovery
                                                                                                        a web search engine, an e-com search also intends to ensure
and populating some of the items selected through explo-
                                                                                                        user satisfaction by presenting the most relevant items that
ration. Then, we present a few exploration strategies with
                                                                                                        a user is searching for. However, an e-com search engine
low regret bounds in terms of business metrics. We conduct
                                                                                                        generates revenue directly based on the products presented
a simulation study with a synthetically generated dataset
                                                                                                        to the users; this is in sharp contrast with how a web search
that represents items with different utility distribution and
                                                                                                        engine generates revenue through advertising and raises sig-
compares these strategies using metrics based on sales, rev-
                                                                                                        nificant new challenges for designing effective search algo-
enue, relevance, and discovery. We find that a strategy based
                                                                                                        rithms. Thus besides the usual goal of maximizing relevance
on adaptive submodular function based discovery framework
                                                                                                        of the search results, an e-com search engine also needs to
can provide a nice balance of business metrics and discov-
                                                                                                        optimize two additional goals that generally do not exist
erability compared to other strategies based on random ex-
                                                                                                        in web search scenarios, i.e., it has to 1) maximize the busi-
ploration or multi-armed bandit. However, another strategy,
                                                                                                        ness metrics such as revenue or sales from the purchases and
based on monotonic submodular optimization function that
                                                                                                        2) minimize the inventory cost by selling the items faster.
needs to be integrated with linear LTR models also works
                                                                                                        Both the objectives are dependent on the performance of
well for discovery and has nice performances with respect to
                                                                                                        the underlying ranking algorithm of the search engine other
sales, revenue, and relevance.
                                                                                                        than the assortment selection and demand generation. The
                                                                                                        first set of goals of maximizing the business metrics can be
Categories and Subject Descriptors                                                                      achieved by constructing a rank function using learning to
                                                                                                        rank framework with a suitable optimization objective that
H.3.3 [Information Search and Retrieval]: E-Com Search                                                  is in line with the business goals of the e-commerce com-
                                                                                                        pany. The second goal however requires the e-commerce
                                                                                                        sites to have a strategy of discovery so that it can expose
General Terms                                                                                           as many items to the customers faster and determine what
Algorithms                                                                                              products are selling. This allows the companies to readjust
                                                                                                        their assortment strategy to optimize the inventory and as-
                                                                                                        sociated costs. The discovery strategies however, can hurt
                                                                                                        the business goals since it requires showing previously unex-
                                                                                                        plored items to the customers. In e-commerce, learning to
 Permission to make digital or hard copies of all or part of this work for                              rank functions are trained by constructing mainly two types
 personal or classroom use is granted without fee provided that copies are                              of features: (a) features that arise from static contents of
 not made or distributed for profit or commercial advantage and that copies                             the product description (b) features that come from the be-
 bear this©notice
Copyright     2018 by and    the full
                        the paper’s     citation
                                    authors.      onpermitted
                                             Copying  the firstforpage.
                                                                   privateTo
                                                                           andcopy    otherwise,
                                                                                academic purposes. to   havioral signals such as clicks, cart-adds, sales, and review
In: J. Degenhardt,
 republish,           G. Di
                to post   onFabbrizio,
                              servers or S. Kallumadi, M. Kumar,
                                             to redistribute        Y.-C. requires
                                                              to lists,    Lin, A. Trotman, H. Zhao
                                                                                     prior specific
(eds.): Proceedings of the SIGIR 2018 eCom workshop, 12 July, 2018, Ann Arbor, Michigan, USA,
 permission      and/or a fee.                                                                          ratings. It has been observed [16, 21] that the second group
published  at http://ceur-ws.org
 Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.                                                       of features is more useful for finding a comparative relevance
among the items. However, the use of the machine learning         literatures. The algorithms that are commonly used for ex-
algorithms in search engines tends to bias a search engine        ploration include -greedy [23], and variants of upper con-
toward favoring the viewed items by users due to the use of       fidence bound (UCB) algorithm [2]. Additionally, k-armed
features that are computed based on user behavior such as         dueling bandit framework [25] has been developed for find-
clicks, rates of “add to cart”, etc. Since a learning algorithm   ing out optimal bandit in a scenario where each bandit can
would rank items that have already attracted many clicks          be considered to optimize a specific objective. It is possible
on the top, it might overfit to promote the items viewed by       to frame a multiobjective problem with exploration compo-
users. As a result, some items might never have an oppor-         nent using such paradigm. In this paper, we aim to construct
tunity to be shown to a user (i.e., “discovered” by a user),      a framework that is simpler than K-armed bandit but more
thus also losing the opportunity to potentially gain clicks.      practical.
Such “undiscovered” products would then have to stay in the
inventory for a long time incurring extra cost and hurting
satisfaction of the product providers. Hence, it is important     3.    OPTIMIZATION OF E-COM SEARCH
to consider the integrating the aspect of discovery with the        We consider the problem of optimization of an E-Com
ranking mechanism for e-commerce search. In this paper,           search engine over a period of time {1, · · · , T }. Let us as-
we provide a practical yet theoretically sound framework for      sume that there are N queries, denoted by Q = (q1 , q2 , · · · , qN ),
learning to rank where it is possible to have a regulated dis-    that can be sent to the search engine during this time. Let
covery mechanism minimizing the loss in revenue, sales, or        Z = {ζ1 , · · · , ζM } be the set of M items and let the corre-
relevance metrics.                                                sponding prices of the items be given by {ρ1 , · · · , ρM }. Now,
   The key contributions of this paper can be described as        consider a rank function π and π(qi , t) outputs top K items
follows:                                                          which is a permutation from the set Z. Now, let us consider
   (1) This is the first paper that formalizes an e-com search    two binary random variables ΛS              C
                                                                                                       ijt , Λijt where the first one de-
problem as a multi-objective LTR problem with continuous          notes if a sale happens or not and the second one denotes if
exploration that we call learning to rank and discover prob-      there is a click or not for an item ζj given a query qt at time t.
lem. (2) We discuss how we can construct a solution to this       Now, given that we have these two binary random variables,
problem using the existing framework of multi-armed ban-          we can define the probability of sales given a query qi , item
dit and auction mechanism. (3) We also provide a novel            ζj at time t as p(ΛS   ijt = 1|qi , ζj , t) and similarly probability
paradigm combining algorithms from multi-objective opti-          of click for the same can be defined as p(ΛC         ijt = 1|qi , ζj , t).
mization and explore and exploit paradigms.                       Let’s also create a normalization constant Γ that denotes
                                                                  the total number of search sessions during the time period
2.     BACKGROUND                                                 we are considering to conduct this study.
                                                                    Now, we can define several objective functions for the e-
2.1      LTR and Ecommerce Search                                 commerce search such as the following:
                                                                    Conversion rate per visit (CRV):
   Learning to rank (LTR) algorithms [19] involve learning a
function for optimizing a relevance measure using an offline                             t=T i=N j=M
                                                                                       1 XX X
labeled training data set. This function can be a neural net,               gCRV =                   p(ΛS
                                                                                                        ijt = 1|qi , ζj , t)
boosted tree [3, 22], support vector machine [15] or linear                            Γ t=1 i=1 j=1
models [13] etc. The common ranking measures [14] include
normalized discounted cumulative gain (NDCG), mean re-              Revenue per visit (RPV):
ciprocal ranking (MRR), mean average precision (MAP)                                    t=T i=N j=M
etc. The loss functions based on these ranking measures                              1 XX X
                                                                           gRP V =                 p(ΛS
                                                                                                      ijt = 1|qi , ζj , t)ρj
are mostly non-smooth, however researchers [5, 17] derive                            Γ t=1 i=1 j=1
LTR algorithms such as LambdaMart [4] to address such
problems. There is also a work on multi-objective learning          We now also define a relevance based objective as follows:
to rank [22] that uses a click rate along with the human                                         t=T i=N
judged ratings to augment the computation of gradient di-                                     1 XX
                                                                                gREL (π) =              RM (π(qi , t))
rection from pairwise training data in LambdaMart algo-                                       Γ t=1 i=1
rithm. This algorithm can potentially be modified to learn
to maximize multiple graded objectives where the priorities       where RM can be any relevance measure such as normal-
are known in advance. There are not too many papers on            ized discounted cumulative gain (NDCG), which is gener-
LTR algorithms applied to e-com data. Recently, researchers       ally defined based on how well the ranked list π(q) matches
from Walmart [16] have conducted some experiments with            the ideal ranking produced based on human annotations of
multiple LTR algorithms with e-com data. In their obser-          relevance of each item to the query. The aggregation func-
vation, LambdaMart algorithm with a sales and click rate          tion does not have to be a sum (over all the queries); it
based objective worked well for their data set. In a sepa-        can also be, e.g., the minimum of relevance measure over all
rate paper, researchers from Amazon [21] reported use of          the queries, which would allow us to encode the preference
gradient boosted tree based regression methods [8] for LTR        for avoiding any query with a very low relevance measure
application. They also reported using various normalization       score. Note that it is possible to define the NDCG based on
techniques to avoid the biases generated due to the heavy         ratings created from direct user feedback such as clicks, add-
reliance of historical behavioral data in regression.             to-cart, or sales, since comparing items based on relevance
                                                                  in e-commerce can be hard [21]. In this paper, we assume
2.2      Online learning and Multi-armed bandit                   that the relevance function uses a rating system without
     Multi-armed bandit (MAB) algorithms [11, 20] have a rich     any specific notion of how that rating system is constructed.
With this generality, we can define the following objective        approach is that we can directly build on the existing LTR
for relevance:                                                     framework, though the new objective function would pose
                                                                   new challenges in designing effective and efficient optimiza-
          gREL (π) = mini∈[1,N ],j∈[1,T ] RM (π(qi , t))           tion algorithms to actually compute optimal rankings. One
   Now,we define a notion of discovery for an e-com engine.        disadvantage of this strategy is that we cannot easily control
   To formalize the notion of discoverability, we say that the     the tradeoff between different objectives (e.g., we sometimes
LTR function f is β-discoverable if all items are shown at         may want to set a lower bound on one objective rather than
least β times. Now, we can further define a β-discoverability      to maximize it). To address this limitation, we may allow
rate as the percentage of items that are impressed at least        some objectives to be treated as constraints.
β times in a fixed period of time. Let us now define again
a binary variable γi for every item ζi and then assume that
                                                                   4.2    Incremental Optimization of e-Com Search
γi = 1 if the item got shown in the search results for β times        An alternative strategy is to take an existing LTR ranking
and γi = 0 in case the item is not shown in the search results     function as a basis and seek to improve the ranking (e.g., by
more than β times. We can express this as follows:                 perturbation) so as to optimize multiple objectives as de-
                                      Pi=|Z|                       scribed above; such an incremental optimization strategy is
                                       i=1 γi                      more tractable as we will be searching for solutions to the op-
                 gβ−discoverability =
                                        |Z|                        timization problem in the neighborhood of an existing rank-
                                                                   ing function. Specifically, we instantiate the optimization
   We can further define another objective sales through rate      problem discussed in the previous section by assuming that
(STR) to denote the number of items sold in a period of time       the engine incrementally perturbs a learning to rank func-
as percentage of the inventory. This objective represent the       tion slightly in a way that the original relevance does not
efficiency of selling items faster. Let us define again a binary   vary too much and then each function π = (f1 , ..., fT ) at
variable λs for every item ζs and then assume that λs = 1          time step t = 1 to t = T gradually improves upon multiple
if the item got sold and λs = 0 in case the item is not sold.      objectives. Intuitively, we start with a function f and gen-
   Thus, the STR objective function when using policy π can        erate the next function in the “neighborhood” of an existing
be written as follows:                                             ranking function f within a relevance bound and we take
                                     i=|Z|
                                     X                             each step in a way that it attempts to improve the sales and
                       gST R (π) =           λi                    revenue with high probability and it also tries to achieve bet-
                                     i=1                           ter β-discoverability and STR for the items in its inventory.
  Now, we define the objective of an e-com search engine as        However, the search space for all such ranking functions is
the following:                                                     still intractable and we need a heuristic to obtain a “good”
                                                                   Pareto optimal solution for the multi objective optimization
      maximize gCRV , gRPV , gREL , gβ−discoverability , gSTR      problem.
                                                                      We thus need to construct an optimization framework that
   This is a multi-objective optimization and has an explo-
                                                                   integrates an exploration mechanism with a learning to rank
ration component in the form of discovery objective. The
                                                                   paradigm.
priorities of these objectives can be business dependent. For
example, a business aiming to maximize the gross revenue           4.3    Discovery Framework
can prioritize maximizing revenue and on the other hand, a            One way of constructing a framework that optimizes an
business aiming to improve sales growth may want to maxi-          e-com site for multiple business objectives along with an
mize the conversions. Similarly, a business that depends on        exploration option can be to use a fixed number of rank
periodically bringing new items in the inventory may want          positions say x for exploration and use the rest of the top
to prioritize the discovery objective along with sales and rev-    (K − x) positions for exploitation. In this set-up, we can
enue. In most real scenarios, such business priorities come        construct an explore set of items E where all the items have
from the strategic directions of an e-commerce platform.           been shown less than β times. We then select top x candi-
   We now discuss the strategies for solving this optimization     dates for a query qi from the intersection of the recall set of
problem.                                                           items for the query and the set E. Let’s call this set corre-
                                                                   sponding to query qi as Ei . The items selected for the rest
4.    STRATEGIES FOR SOLVING THE OP-                               of the (K − x) positions can be based on a suitable multi-
      TIMIZATION PROBLEM                                           objective LTR ranking function. This is a framework that
                                                                   is practical and the cost of the exploration can be estimated
   Since there are multiple objectives to optimize, it is impos-
                                                                   by the loss of revenue for selecting not the most optimal
sible to directly apply an existing Learning to Rank (LTR)
                                                                   items for x positions. Note, it is also possible to select any
method to optimize all the objectives. However, there are
                                                                   rank positions to show the explored items. However, in this
multiple ways to extend an LTR method to solve the prob-
                                                                   paper, let us assume that we keep those x items at top for
lem as we will discuss below.
                                                                   keeping our framework simple to analyze.
4.1     Direct extension of LTR                                    4.4    LTR and Discoverability
   One simple strategy is to combine the multiple objectives
                                                                     Now that we have a formal model for the e-commerce
with appropriate weighting parameters so as to form one
                                                                   search optimization problem, we want to establish the fol-
single objective function, which can then be used in a tra-
                                                                   lowing point:
ditional LTR framework to find a ranking that would opti-
mize the consolidated objective function. The weights can             • A LTR function is not optimal for discoverability cri-
be based on the business priorities. The advantage of this              teria.
                                                                    on sales over impression for the query. If an item ζj is in
                                                                    the set E ∩ Ri and is shown bj times in T iterations, and is
      In an e-com site, only top K items are shown to the
                                                                    sold aj times inqbetween, then the UCB score of the item ζj
      customers. Suppose there is an item ζ that is rele-                      a
      vant to a set of queries Q. Now, assume that for each         is ucbj = bjj + 2 logb2jT +1 . Note, this is for a specific query.
      query q ∈ Q there are K other relevant old items for          We then select x items based on top UCB scores.
      which f (X2)) > 0. Assuming that the K items are                 Explore LTR (eLTR In this, we define a function that
      equally relevant, we can say that the ζ has same value        we call explore LTR (eLTR) to select the x items. The rest
      of f (X1) compared to the k old items. Then, ζ will al-       of the items for top K can be chosen using the traditional
      ways rank below the top K items for the set of queries        LTR. Then, we can either keep the x items on top or we can
      in Q. Suppose the current number of impressions for           rerank all K items based on eLTR.
      ζ is less than β, then it has no chance of receiving im-         The main motivation for the eLTR is the observation that
      pression under the ranking framework using function f         there is inherent overfitting in the regular ranking function
      because it will always be ranked below the top K items        used in an e-com search engine that hinders exploration,
      unless there is new query that ranks this item higher.        i.e., hinders improvement of β-discoverability and STR. The
      This means that as long as we can find a percentage of        overfitting is mainly caused by a subset of features derived
      items that receives below β impressions, it is hard to        from user interaction data. Such features are very important
      improve the β-discoverability using a traditional LTR         as they help inferring a user’s preferences, and the overfit-
      mechanism without any integrated discovery mecha-             ting is actually desirable for many repeated queries and for
      nism.                                                         items that have sustained interests t users (since they are
                                                                    “test cases” that have occurred in the training data), but
5.    DISCOVERY STRATEGIES                                          it gives old items a biased advantage over the new items,
                                                                    limiting the increase of β-discoverability and STR. Thus the
   In this section, we propose several strategies to select the x   main idea behind e-LTR is thus to separate such features
items for discovery that aims to minimize the loss of business      and introduce a parameter to restrict their influences on the
metrics.                                                            ranking function, indirectly promoting STR. Formally, we
                                                                    note that in general, a ranking function can be written in
5.1    Auction based strategy                                       the following form:
   In this strategy, the items for top x positions can be as-
signed based on an auction strategy where sellers can bid                            y = f (X) = g(f1 X1), f2 (X2))
for the positions. This strategy has been used in sponsored         where y ∈ R denotes a ranking score and X ∈ RN is a
search for search engine companies such as Google and in            N dimensional feature vector, X1 ∈ RN 1 and X2 ∈ RN 2
sponsored product for e-commerce company such as Ama-               are two different groups of features such that N 1 + N 2 =
zon. Design of such auction mechanism that balances the             N, X1 ∪ X2 = X. The two groups of features are meant to
revenue and relevance is a complex subject and has been             distinguish features that are unbiased (e.g., content match-
discussed in recent papers by Ghose et al. [10] and Athey at        ing features) from those that are inherently biased (e.g.,
al. [1]. However, many e-commerce businesses may not want           clickthrough-based features). Here g is an aggregation func-
to use auction strategy to aid discovery since it can hurt the      tion which is monotonic with respect to both arguments. It
growth of seller participation on those sites.                      is easy to show that any linear model can be written as a
                                                                    monotonic aggregation function. It is not possible to use
5.2    Exploration with LTR (eLTR)                                  such representation for models such as additive trees. How-
    This is a strategy discussed in a recent paper by Goswami       ever, our previous techniques do not have such limitation
et al. [12]. In this strategy, we define the set from which         since they are completely separated from the LTR. In this
the LTR function selects the items as Li ⊂ Ri for a given           paper, we keep our discussion limited to linear models. We
query qi . We assume that all the items outside set Li are          now define explore LTR (eLTR) function as follows:
not β-discoverable. Then, L = ∪i=N   i=1 Li is the set of all β-
                                                                                   y e = fe (X) = g(f (X1),  × f (X2))
discoverable items. Hence, the set E = R \ L can then be
consisting of all the items that require exploration.               where y e ∈ R and 0 ≤  ≤ 1 is a variable in our algorithmic
    Goswami et al. [12] discuss three strategies to incorporate     framework. Since, g is monotonic, fe (X) <= f (X) when
discovery in an e-commerce search.                                   ≤ 1. Since feature set X2 is a biased feature set favoring
    Random selection based exploration from the re-                 old items, we can expect ranking based on f e would be more
call set (RSE): This is a baseline strategy for continuous          in favor of new items in comparison with the original f ,
exploration with a LTR algorithm. In this, for every query          achieving the goal of emphasizing exploration of new items.
qi , we randomly select x items from the set E ∩Ri . Then, we       Note that  controls the amount of exploration: the smaller 
put these x items on top of the other (k − x) items that are        is, the more exploration (at the cost of exploitation). Since
selected using LTR from the set Ri . The regret here will be        the maximum exploration is achieved when =0, in which
linear with the number of search sessions with exploration.         case, ranking is entirely relying on f1 , the only loss in the
    Upper confidence bound (UCB) based exploration                  original objective function is incurred by the removal of f2 .
from the recall set (UCBE): This is another simple strat-           By controlling what features to be included in f2 , we can
egy that uses a variant of UCB based algorithm for explo-           control the upper bound of the loss. In this sense, eLTR
ration instead of random sampling. Here, we maintain a              ensures a “safe” exploration strategy since f1 is always active.
MAB for each query. We consider each item in the set                Note, this composed function gradually becomes the LTR
E ∩ Ri as an arm for the MAB corresponding to a query               function when  tends to 1. There can be various ways of
qi . We maintain an UCB score for each of those items based         constructing the  such as the following:
   eLTR basic exploration (eLTRb): In this strategy,            lenges. Here we discuss some ideas for evaluating the pro-
                 I
we keep  = Tmax    . Here, I is an iteration and Tmax is a     posed e-LTR algorithm, which we hope will stimulate more
maximum number of iteration after which everything can          work in this direction.
be reset. This is a very simple strategy where the eLTR just       Given a set of queries and an initial ranking function, an
increases the importances of the behavioral features gradu-     e-LTR method is expected to “discover” the ideal ranking
ally with every iteration.                                      for each query, over a sequence of iterations. Ideal ranking
   eLTR ucb weighted exploration (eLTRu): In this               may be defined as one that maximizes business metrics like
                          ucb
strategy, we keep  = Ujj . Here, Uj is a normalization         sales, revenue etc. Since exploration is involved, one can
factor and in our experiment it is chosen to be the maximum     expect the quality of ranking (N DCG) and consequently
UCB score in the set E ∩ Ri .                                   sales/revenue to fluctuate for individual queries during the
   eLTR ucb weighted exploration and reranking (eL-             discovery process. However, the benefit of an e-LTR method
TRur): This strategy first selects the top x items using        lies in its ability to deliver greater aggregate sales/revenue
eLTRu and it selects the remaining (k − x) items using the      over some reasonable number of iterations N , compared to
classic LTR and then it reranks the k items using eLTRu.        a non-discovery LTR method. We must thus compare our
   Goswami et al. [12] show in their paper that eLTR can        methods on the iterative growth in quality of ranking, and
be considered as a monotonic sub-modular function and can       aggregate gain in business metrics.
thus be approximated by a greedy algorithm.                        The ideal approach for conducting such an evaluation would
                                                                require simultaneously deploying all candidate methods to
5.3   Adaptive submodular function based eLTR                   live user traffic, and computing various user engagement
      (AeLTR)                                                   metrics such as click through rate,sales, revenue etc. How-
                                                                ever this strategy is difficult to implement in practice. Since
  This is an extension of the eLTR mechanism presented by
                                                                user traffic received by each candidate method is different,
Goswami et al. [12]. However, it makes the strategy much
                                                                we need to direct substantial amount of traffic to each method
more general by allowing us to use any LTR rank function
                                                                to make observations comparable and conclusions statisti-
going beyond the limitation of linear LTR models. In this,
                                                                cally significant. Deployment of a large number of experi-
we use the following function:
                                                                mental and likely sub-optimal ranking functions, especially
                   y a = fa (X) = a f (X)                      when evaluating baselines, can result in significant business
                                                                losses for e-Commerce search engines. More importantly,
Here, the a is a function of UCB as follows:                   such an approach of using live user traffic would not allow
                                 ucbj                           us to have a fair comparison of any future algorithm with
                         a =                                   the ones that we test today, so it does not facilitate further
                                  Uj
                                                                study of the problem.
This algorithm has been discussed in a paper by Gabillon           Perhaps a good and feasible strategy is to design a simulation-
et al. [9] and is proven to be also a monotonic sub-modular     based counterfactual evaluation method [18]. Here, unlike a
function. Hence, a simple greedy algorithm can give a nice      traditional static test collection, we also have to introduce
approximation also for this. We however do not know how         models to simulate user behavior and model the biases ob-
this strategy can be compared in terms of losses and the        served in real system. This can be a topic itself and we omit
discovery metric with respect to the other approaches that      further discussion on this as it is out of scope for this short
we have discussed in this paper.                                paper.
5.4   Regret of the eLTR strategies
   The RSE function has linear regret since it is random. The
                                                                7.   EXPERIMENTAL RESULTS
regret for this can be arbitrarily bad and can thus be given       In this section, we first construct a synthetic historical
by O(Rl ). The regret for the UCBE algorithm can be given       dataset with queries, items and their prices. We also gener-
by O(log (Rl )). Both eLTR and AeLTR have similar regret        ate the true purchase probabilities and utility scores for the
because they both uses monotonic sub-modularity property.       item and query pairs. Additionally, we use a specific rank
These algorithms pull items using a score that is a func-       function to simulate the behavior of a trained LTR model.
tion of the LTR score and are expected to provide a better         Then we conduct a simulation as described in section ??
loss compared to the approaches that are based on multi-        with various exploration strategies. During the simulation
armed bandit algorithms. However, it is not immediately         we use the observed purchase probabilities estimated from
clear how these algorithms can optimize the discoverability     the purchase feedback as the most important feature for the
objective. The regret of the eLTR group of algorithms can       rank function but we use the true probabilities generated
                           |E|                                  during the initial data generation phase to simulate the user
be estimated by O(1+1/e− x ) times worst compared to the
                                                                behavior.
optimal which comes from their monotonic sub-modularity
                                                                   The main goal of this experimental study is to evaluate
property. The eLTR algorithms can be considered to be sim-
                                                                the behavior of the exploration strategies with different dis-
ilar to the -greedy approaches where the regret bound is not
                                                                tributions of the utility scores representing different state of
better than the UCB algorithm. However, in our simulation,
                                                                the inventory in an e-com company.
we want to study the discoverability metrics for such eLTR
                                                                   We evaluate our algorithms by running the simulation for
algorithms.
                                                                T times. We compute RPV and β-discoverability at the end
                                                                of T iterations. We also compute a purchase based mean re-
6.    EVALUATION METHODOLOGY                                    ciprocal ranking [6] metric (MRR). This metric is computed
  Due to the involvement of multiple objectives, the eval-      by summing the reciprocal ranks of all the items that are
uation of E-Com search algorithms also presents new chal-       purchased in various user visits for all queries. Moreover,
we also discretize our gold utility score between 1 to 5 and              Algorithms     RPV     NDCG      MRR      β−d
generate a rating for each item. This also allows us to com-              LTR            0.12     0.96     0.54      0.10
pute a mean NDCG score at k-th position for all the search                RSE            0.06     0.76     0.21      0.78
sessions as a relevance metric.                                           UCBE           0.08     0.87     0.28      0.66
  We expect to see that the RPV and NDCG of the LTR                       eLTRb          0.09     0.94     0.33      0.67
function will be the best. however the β-discoverability val-             eLTRu          0.09     0.94     0.33      0.67
ues will be better in ranking policies that use an exploration            eLTRur         0.09     0.94     0.33      0.67
strategy. The new ranking strategies will incur loss in RPV               AeLTR          0.10     0.95     0.40      0.51
and NDCG and based on our theoretical analysis we ex-
pect the eLTR methods to have less loss compared to the
RSE and UCB based approaches in those measures. We                 Table 1: Simulation of eLTR framework, with |Q| = 100, |Z| =
also expect to see a loss in MRR for all exploration meth-         5000, |L| = 200, β − d = 10%, β = 50, K = 6, x = 3, T = 50000.
ods. However, we mainly interested in observing how these
algorithms perform in β-discoverability metric compared to
LTR.                                                               through the search results one after another from the top
                                                                   and can purchase an item based on that item’s purchase
                                                                   probability for that query. Note, in order to keep the simu-
7.1    Synthetic data generation                                   lation simple, we consider an user only purchases one item
   We first generate a set of N queries and M items. We            in one visit and leaves the site after that. We also can ap-
then assign the prices of the items by randomly drawing a          ply a discount to the probability of purchase logarithmically
number between a minimum and a maximum price from a                                                          1
                                                                   for each lower rank by multiplying log (r+1) , where r is the
                                                                                                           2
multi-modal Gaussian distribution that can have up to 1 to         ranking position of the item. This is to create an effect of
10 peaks for a query. We select the specific number of peaks       the position bias [7].
for a query uniform randomly. We also assign a subset of
the peak prices to a query to be considered as the set of it’s     7.2    Description of the experimental study
preferred prices. This makes a situation where every query
                                                                      We conduct two sets of experiments with this simulated
may have a few preferred price peak points where it may
                                                                   data.
also have the sales or revenue peaks.
                                                                      In the first set of experiments we use a Gaussian distri-
   Now that we have the items and queries defined, we ran-
                                                                   bution with mean 0.5 and the variance 0.1 for generating
domly generate an utility score, denoted by (uij ) for every
                                                                   the utility scores, but everything else is same as the previ-
item ζj for a query qi . In our set up, we use uniform ran-
                                                                   ous experiment. This means that the utility of the items
dom, Gaussian and a long tailed distribution for selecting
                                                                   in the inventory follows a normal distribution. We generate
the utilities. These three different distributions represent
                                                                   the scores for all the products from a Gaussian distribution
three scenarios for a typical e-com company’s inventory. Ad-
                                                                   with mean 0.5 and the variance 0.1. The table 1 shows the
ditionally, we generate a purchase probability between 0.02
                                                                   tables with final metrics for all the algorithms. We observe
to 0.40 for every item for every query. We generate these
                                                                   that with a Gaussian distribution of utility scores the eLTR
probabilities such that they correlates with the utility score.
                                                                   approaches have better MRR, and β-discoverability. We ob-
We generate these numbers in a way so that these are cor-
                                                                   serve that the AeLTR approach works well by producing
related with the utility scores with a statistically significant
                                                                   good MRR and yet the β-discoverability is better than LTR.
(p-value less than 0.10) Pearson correlation coefficient [24].
                                                                   This is a powerful result since it shows that it is possible to
We also intend to correlate the purchase probability with
                                                                   integrate this with more sophisticated LTR algorithms
the preferred peak prices for a query. Hence, we give an
                                                                      In the second set of experiment, we use a power law to
additive boost between 0 to 0.1 to the purchase probability
                                                                   generate the utility distribution. This means that only a
in proportion to the absolute difference of the price of the
                                                                   small set of items here can be considered valuable in this
item from the closest preferred mean price for that query.
                                                                   scenario. The table 2 shows the final metrics for this case.
By generating the purchase probabilities in this way, we en-
                                                                   We notice that even with this distribution of utility scores
sure that the actual purchase probabilities are related to the
                                                                   the eLTR variants have smaller loss in RPV, NDCG, and
preferred prices for the queries, and also it is related to the
                                                                   in MRR. Note that in this distribution, the discoverability
utility scores of the items for a given query. Now, we de-
                                                                   can be considered to be naturally not so useful since a large
fine a β-discoverability rate β = b and selects b × M items
                                                                   number of items are not that valuable. We expect in such
randomly from the set of all items. In our simulation, we
                                                                   situation, a nice discoverability algorithm can help to elimi-
assume that the estimated (observed) purchase probability
                                                                   nate items that do not get sold after sufficient exposure and
for all the items in set E at the beginning can be zero. The
                                                                   enable the e-com company to optimize it’s inventory. We
rest of the items purchase probability are assumed to be
                                                                   observe a very similar trend for AeLTR algorithm as our
estimated correctly at the beginning. Now, we create a sim-
                                                                   previous experiment. It gives better RPV and MRR close
ple rank function that is a weighted linear function of the
                                                                   to the LTR algorithm but it has less β-discoverability com-
utility score (u), the observed purchase probability (po ), and
                                                                   pared to eLTR approaches.
the normalized absolute difference between the product price
and the closest preferred mean price (m̂p for the query such
that l = 0.60po + 0.20u + 0.20m̂p . Here l denotes the score       8.    CONCLUSIONS AND FUTURE WORK
of the ranker. This ranker simulates a trained LTR function          This paper represents a first step toward formalizing the
in e-com search where usually the sales can be considered          emerging new E-Com search problem as an optimization
the most valuable behavioral signal.                               problem with multiple objectives including the sales(CRV),
   We now construct an user model. Here, an user browses           revenue (RPV), and discoverability besides relevance. We
       Algorithms     RPV     NDCG      MRR     β−d              [10] A. Ghose and S. Yang. An empirical analysis of search
       LTR            9.56     0.57     0.45     0.11                 engine advertising: Sponsored search in electronic
       RSE            7.55     0.27     0.25     0.76                 markets. Management Science, 55(10):1605–1622,
       UCBE           7.55     0.27     0.26     0.66                 2009.
       eLTRb           8.2     0.33     0.31     0.67            [11] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed
       eLTRu           8.3     0.33     0.31     0.67                 bandit allocation indices. John Wiley & Sons, 2011.
       eLTRur          8.4     0.33     0.32     0.67            [12] A. Goswami, C. Zhai, and P. Mohapatra. Learning to
       AeLTR           9.4     0.41     0.40     0.40                 rank and discover for e-commerce search. In
                                                                      International Conference on Machine Learning and
                                                                      Data Mining in Pattern Recognition.
Table 2: Simulation of eLTR framework, with |Q| = 100, |Z| =     [13] L. Hang. A short introduction to learning to rank.
5000, |L| = 200, β − d = 10%, β = 50, K = 6, x = 3, T = 50000.
                                                                      IEICE TRANSACTIONS on Information and
                                                                      Systems, 94(10):1854–1862, 2011.
formally define these objectives and discuss multiple strate-    [14] K. Järvelin and J. Kekäläinen. Ir evaluation methods
gies for conducting exploration with the learning to rank             for retrieving highly relevant documents. In
algorithms. We hope that our work will open up many new               Proceedings of the 23rd annual international ACM
directions in research for optimizing e-com search. The obvi-         SIGIR conference on Research and development in
ous next step is to empirically validate the strategies based         information retrieval, pages 41–48. ACM, 2000.
on log data from an e-com search engine. The theoreti-           [15] T. Joachims. Optimizing search engines using
cal framework also enables many interesting ways to further           clickthrough data. In Proceedings of the eighth ACM
formalize the e-com search problem and develop new effec-             SIGKDD international conference on Knowledge
tive e-com search algorithms. Finally, the proposed discov-           discovery and data mining, pages 133–142. ACM,
ery framework is just a small step toward solving the new             2002.
problem of optimizing discoverability in e-com search; it is     [16] S. K. Karmaker Santu, P. Sondhi, and C. Zhai. On
important to further develop more effective algorithms by             application of learning to rank for e-commerce search.
using these algorithms as baselines.                                  In Proceedings of the 40th International ACM SIGIR
                                                                      Conference on Research and Development in
                                                                      Information Retrieval, SIGIR ’17, pages 475–484, New
9.   REFERENCES                                                       York, NY, USA, 2017. ACM.
 [1] S. Athey and D. Nekipelov. A structural model of            [17] H. Li. A short introduction to learning to rank. IEICE
     sponsored search advertising auctions. In Sixth ad               TRANSACTIONS on Information and Systems,
     auctions workshop, volume 15, 2010.                              94(10):1854–1862, 2011.
 [2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time       [18] L. Li, S. Chen, J. Kleban, and A. Gupta.
     analysis of the multiarmed bandit problem. Machine               Counterfactual estimation and optimization of click
     learning, 47(2-3):235–256, 2002.                                 metrics in search engines: A case study. In Proceedings
 [3] C. Burges, T. Shaked, E. Renshaw, A. Lazier,                     of the 24th International Conference on World Wide
     M. Deeds, N. Hamilton, and G. Hullender. Learning to             Web, pages 929–934. ACM, 2015.
     rank using gradient descent. In Proceedings of the          [19] T.-Y. Liu. Learning to rank for information retrieval.
     22nd international conference on Machine learning,               Foundations and Trends in Information Retrieval,
     pages 89–96. ACM, 2005.                                          3(3):225–331, 2009.
 [4] C. J. Burges. From ranknet to lambdarank to                 [20] A. Mahajan and D. Teneketzis. Multi-armed bandit
     lambdamart: An overview. Learning, 11(23-581):81,                problems. In Foundations and Applications of Sensor
     2010.                                                            Management, pages 121–151. Springer, 2008.
 [5] C. J. Burges, R. Ragno, and Q. V. Le. Learning to           [21] D. Sorokina and E. Cantu-Paz. Amazon search: The
     rank with nonsmooth cost functions. In Advances in               joy of ranking products. In Proceedings of the 39th
     neural information processing systems, pages 193–200,            International ACM SIGIR Conference on Research
     2007.                                                            and Development in Information Retrieval, SIGIR ’16,
 [6] N. Craswell. Mean reciprocal rank. In Encyclopedia of            pages 459–460, New York, NY, USA, 2016. ACM.
     Database Systems, pages 1703–1703. Springer, 2009.          [22] K. M. Svore, M. N. Volkovs, and C. J. Burges.
 [7] N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An             Learning to rank with multiple objective functions. In
     experimental comparison of click position-bias models.           Proceedings of the 20th international conference on
     In Proceedings of the 2008 International Conference              World wide web, pages 367–376. ACM, 2011.
     on Web Search and Data Mining, WSDM ’08, pages              [23] J. Vermorel and M. Mohri. Multi-armed bandit
     87–94. ACM, 2008.                                                algorithms and empirical evaluation. In European
 [8] J. H. Friedman. Greedy function approximation: a                 conference on machine learning, pages 437–448.
     gradient boosting machine. Annals of statistics, pages           Springer, 2005.
     1189–1232, 2001.                                            [24] R. R. Wilcox. Introduction to robust estimation and
 [9] V. Gabillon, B. Kveton, Z. Wen, B. Eriksson, and                 hypothesis testing. Academic press, 2011.
     S. Muthukrishnan. Adaptive submodular                       [25] Y. Yue, J. Broder, R. Kleinberg, and T. Joachims.
     maximization in bandit setting. In Advances in Neural            The k-armed dueling bandits problem. Journal of
     Information Processing Systems, pages 2697–2705,                 Computer and System Sciences, 78(5):1538–1556,
     2013.                                                            2012.