<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Optimization of E-Commerce Search and Discovery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anjan Goswami</string-name>
          <email>agoswami@ucdavis.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ChengXiang Zhai</string-name>
          <email>czhai@illinois.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prasant Mohapatra</string-name>
          <email>pmohapatra@ucdavis.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Davis</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Illinois</institution>
          ,
          <addr-line>Urbana-Champaign, IL</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>E-Commerce (E-Com) search is an emerging problem with multiple new challenges. One of the primary challenges constitutes optimizing multiple objectives involving business metrics such as sales and revenue and maintaining a discovery strategy for the site. In this paper, we formalize the e-com search problem for optimizing metrics based on sales, revenue, and relevance. We de ne a notion of item discoverability in search and show that learning to rank (LTR) algorithms trained with behavioral features from e-com customer interactions (eg. clicks,cart-adds, orders etc.) do not by themselves address the discoverability problem. Instead, a suitable explore-exploit framework must be integrated with the ranking algorithm. We thus construct a practical discovery strategy by keeping a few top positions for discovery and populating some of the items selected through exploration. Then, we present a few exploration strategies with low regret bounds in terms of business metrics. We conduct a simulation study with a synthetically generated dataset that represents items with di erent utility distribution and compares these strategies using metrics based on sales, revenue, relevance, and discovery. We nd that a strategy based on adaptive submodular function based discovery framework can provide a nice balance of business metrics and discoverability compared to other strategies based on random exploration or multi-armed bandit. However, another strategy, based on monotonic submodular optimization function that needs to be integrated with linear LTR models also works well for discovery and has nice performances with respect to sales, revenue, and relevance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.3.3 [Information Search and Retrieval]: E-Com Search</p>
      <sec id="sec-1-1">
        <title>Algorithms</title>
        <p>Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.</p>
        <p>Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
e-com search, retrieval models, learning to rank,
discoverability, exploration-exploitation</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>
        One of the most critical components of an e-commerce
(ecom) marketplace is its search functionality. The goal of an
e-commerce search engine is to help the buyers to nd and
purchase their desirable products. The buyers' intent
represents the demand side and the products listed by the sellers
represent the supply side of an e-com platform. The e-com
search engine's function can be considered to provide an
efcient matching platform for the supply and the demand
side and to facilitate transactions to generate revenue. Like
a web search engine, an e-com search also intends to ensure
user satisfaction by presenting the most relevant items that
a user is searching for. However, an e-com search engine
generates revenue directly based on the products presented
to the users; this is in sharp contrast with how a web search
engine generates revenue through advertising and raises
signi cant new challenges for designing e ective search
algorithms. Thus besides the usual goal of maximizing relevance
of the search results, an e-com search engine also needs to
optimize two additional goals that generally do not exist
in web search scenarios, i.e., it has to 1) maximize the
business metrics such as revenue or sales from the purchases and
2) minimize the inventory cost by selling the items faster.
Both the objectives are dependent on the performance of
the underlying ranking algorithm of the search engine other
than the assortment selection and demand generation. The
rst set of goals of maximizing the business metrics can be
achieved by constructing a rank function using learning to
rank framework with a suitable optimization objective that
is in line with the business goals of the e-commerce
company. The second goal however requires the e-commerce
sites to have a strategy of discovery so that it can expose
as many items to the customers faster and determine what
products are selling. This allows the companies to readjust
their assortment strategy to optimize the inventory and
associated costs. The discovery strategies however, can hurt
the business goals since it requires showing previously
unexplored items to the customers. In e-commerce, learning to
rank functions are trained by constructing mainly two types
of features: (a) features that arise from static contents of
the product description (b) features that come from the
behavioral signals such as clicks, cart-adds, sales, and review
ratings. It has been observed [
        <xref ref-type="bibr" rid="ref16 ref21">16, 21</xref>
        ] that the second group
of features is more useful for nding a comparative relevance
among the items. However, the use of the machine learning
algorithms in search engines tends to bias a search engine
toward favoring the viewed items by users due to the use of
features that are computed based on user behavior such as
clicks, rates of \add to cart", etc. Since a learning algorithm
would rank items that have already attracted many clicks
on the top, it might over t to promote the items viewed by
users. As a result, some items might never have an
opportunity to be shown to a user (i.e., \discovered" by a user),
thus also losing the opportunity to potentially gain clicks.
Such \undiscovered" products would then have to stay in the
inventory for a long time incurring extra cost and hurting
satisfaction of the product providers. Hence, it is important
to consider the integrating the aspect of discovery with the
ranking mechanism for e-commerce search. In this paper,
we provide a practical yet theoretically sound framework for
learning to rank where it is possible to have a regulated
discovery mechanism minimizing the loss in revenue, sales, or
relevance metrics.
      </p>
      <p>The key contributions of this paper can be described as
follows:</p>
      <p>(1) This is the rst paper that formalizes an e-com search
problem as a multi-objective LTR problem with continuous
exploration that we call learning to rank and discover
problem. (2) We discuss how we can construct a solution to this
problem using the existing framework of multi-armed
bandit and auction mechanism. (3) We also provide a novel
paradigm combining algorithms from multi-objective
optimization and explore and exploit paradigms.</p>
    </sec>
    <sec id="sec-3">
      <title>BACKGROUND 2. 2.1</title>
    </sec>
    <sec id="sec-4">
      <title>LTR and Ecommerce Search</title>
      <p>
        Learning to rank (LTR) algorithms [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] involve learning a
function for optimizing a relevance measure using an o ine
labeled training data set. This function can be a neural net,
boosted tree [
        <xref ref-type="bibr" rid="ref22 ref3">3, 22</xref>
        ], support vector machine [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or linear
models [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] etc. The common ranking measures [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] include
normalized discounted cumulative gain (NDCG), mean
reciprocal ranking (MRR), mean average precision (MAP)
etc. The loss functions based on these ranking measures
are mostly non-smooth, however researchers [
        <xref ref-type="bibr" rid="ref17 ref5">5, 17</xref>
        ] derive
LTR algorithms such as LambdaMart [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to address such
problems. There is also a work on multi-objective learning
to rank [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] that uses a click rate along with the human
judged ratings to augment the computation of gradient
direction from pairwise training data in LambdaMart
algorithm. This algorithm can potentially be modi ed to learn
to maximize multiple graded objectives where the priorities
are known in advance. There are not too many papers on
LTR algorithms applied to e-com data. Recently, researchers
from Walmart [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] have conducted some experiments with
multiple LTR algorithms with e-com data. In their
observation, LambdaMart algorithm with a sales and click rate
based objective worked well for their data set. In a
separate paper, researchers from Amazon [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] reported use of
gradient boosted tree based regression methods [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for LTR
application. They also reported using various normalization
techniques to avoid the biases generated due to the heavy
reliance of historical behavioral data in regression.
2.2
      </p>
      <p>
        Online learning and Multi-armed bandit
Multi-armed bandit (MAB) algorithms [
        <xref ref-type="bibr" rid="ref11 ref20">11, 20</xref>
        ] have a rich
literatures. The algorithms that are commonly used for
exploration include -greedy [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], and variants of upper
condence bound (UCB) algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Additionally, k-armed
dueling bandit framework [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] has been developed for
nding out optimal bandit in a scenario where each bandit can
be considered to optimize a speci c objective. It is possible
to frame a multiobjective problem with exploration
component using such paradigm. In this paper, we aim to construct
a framework that is simpler than K-armed bandit but more
practical.
3.
      </p>
      <p>OPTIMIZATION OF E-COM SEARCH
We consider the problem of optimization of an E-Com
search engine over a period of time f1; ; T g. Let us
assume that there are N queries, denoted by Q = (q1; q2; ; qN ),
that can be sent to the search engine during this time. Let
Z = f 1; ; M g be the set of M items and let the
corresponding prices of the items be given by f 1; ; M g. Now,
consider a rank function and (qi; t) outputs top K items
which is a permutation from the set Z. Now, let us consider
two binary random variables iSjt; iCjt where the rst one
denotes if a sale happens or not and the second one denotes if
there is a click or not for an item j given a query qt at time t.
Now, given that we have these two binary random variables,
we can de ne the probability of sales given a query qi, item
j at time t as p( iSjt = 1jqi; j ; t) and similarly probability
of click for the same can be de ned as p( iCjt = 1jqi; j ; t).
Let's also create a normalization constant that denotes
the total number of search sessions during the time period
we are considering to conduct this study.</p>
      <p>Now, we can de ne several objective functions for the
ecommerce search such as the following:</p>
      <p>Conversion rate per visit (CRV):
gCRV =
1 tX=T i=N j=M</p>
      <p>X X p( iSjt = 1jqi; j ; t)
t=1 i=1 j=1</p>
      <sec id="sec-4-1">
        <title>Revenue per visit (RPV):</title>
        <p>gRP V =
1 tX=T i=N j=M</p>
        <p>X X p( iSjt = 1jqi; j; t) j
t=1 i=1 j=1
We now also de ne a relevance based objective as follows:
gREL( ) =
1 tX=T i=N</p>
        <p>
          X RM ( (qi ; t ))
t=1 i=1
where RM can be any relevance measure such as
normalized discounted cumulative gain (NDCG), which is
generally de ned based on how well the ranked list (q) matches
the ideal ranking produced based on human annotations of
relevance of each item to the query. The aggregation
function does not have to be a sum (over all the queries); it
can also be, e.g., the minimum of relevance measure over all
the queries, which would allow us to encode the preference
for avoiding any query with a very low relevance measure
score. Note that it is possible to de ne the NDCG based on
ratings created from direct user feedback such as clicks,
addto-cart, or sales, since comparing items based on relevance
in e-commerce can be hard [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. In this paper, we assume
that the relevance function uses a rating system without
any speci c notion of how that rating system is constructed.
With this generality, we can de ne the following objective
for relevance:
        </p>
        <p>gREL( ) = mini2[1;N];j2[1;T ]RM ( (qi ; t))
Now,we de ne a notion of discovery for an e-com engine.</p>
        <p>To formalize the notion of discoverability, we say that the
LTR function f is -discoverable if all items are shown at
least times. Now, we can further de ne a -discoverability
rate as the percentage of items that are impressed at least
times in a xed period of time. Let us now de ne again
a binary variable i for every item i and then assume that
i = 1 if the item got shown in the search results for times
and i = 0 in case the item is not shown in the search results
more than times. We can express this as follows:
g discoverability =</p>
        <p>Pi=jZj i
i=1
jZj</p>
        <p>We can further de ne another objective sales through rate
(STR) to denote the number of items sold in a period of time
as percentage of the inventory. This objective represent the
e ciency of selling items faster. Let us de ne again a binary
variable s for every item s and then assume that s = 1
if the item got sold and s = 0 in case the item is not sold.</p>
        <p>Thus, the STR objective function when using policy can
be written as follows:
i=jZj
gST R( ) = X</p>
        <p>i
i=1</p>
        <p>Now, we de ne the objective of an e-com search engine as
the following:
maximize gCRV; gRPV; gREL; g
discoverability; gSTR</p>
        <p>This is a multi-objective optimization and has an
exploration component in the form of discovery objective. The
priorities of these objectives can be business dependent. For
example, a business aiming to maximize the gross revenue
can prioritize maximizing revenue and on the other hand, a
business aiming to improve sales growth may want to
maximize the conversions. Similarly, a business that depends on
periodically bringing new items in the inventory may want
to prioritize the discovery objective along with sales and
revenue. In most real scenarios, such business priorities come
from the strategic directions of an e-commerce platform.</p>
        <p>We now discuss the strategies for solving this optimization
problem.
4. STRATEGIES FOR SOLVING THE
OP</p>
        <p>TIMIZATION PROBLEM</p>
        <p>Since there are multiple objectives to optimize, it is
impossible to directly apply an existing Learning to Rank (LTR)
method to optimize all the objectives. However, there are
multiple ways to extend an LTR method to solve the
problem as we will discuss below.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Direct extension of LTR</title>
      <p>One simple strategy is to combine the multiple objectives
with appropriate weighting parameters so as to form one
single objective function, which can then be used in a
traditional LTR framework to nd a ranking that would
optimize the consolidated objective function. The weights can
be based on the business priorities. The advantage of this
approach is that we can directly build on the existing LTR
framework, though the new objective function would pose
new challenges in designing e ective and e cient
optimization algorithms to actually compute optimal rankings. One
disadvantage of this strategy is that we cannot easily control
the tradeo between di erent objectives (e.g., we sometimes
may want to set a lower bound on one objective rather than
to maximize it). To address this limitation, we may allow
some objectives to be treated as constraints.
4.2</p>
      <p>Incremental Optimization of e-Com Search
An alternative strategy is to take an existing LTR ranking
function as a basis and seek to improve the ranking (e.g., by
perturbation) so as to optimize multiple objectives as
described above; such an incremental optimization strategy is
more tractable as we will be searching for solutions to the
optimization problem in the neighborhood of an existing
ranking function. Speci cally, we instantiate the optimization
problem discussed in the previous section by assuming that
the engine incrementally perturbs a learning to rank
function slightly in a way that the original relevance does not
vary too much and then each function = (f1; :::; fT ) at
time step t = 1 to t = T gradually improves upon multiple
objectives. Intuitively, we start with a function f and
generate the next function in the \neighborhood" of an existing
ranking function f within a relevance bound and we take
each step in a way that it attempts to improve the sales and
revenue with high probability and it also tries to achieve
better -discoverability and STR for the items in its inventory.
However, the search space for all such ranking functions is
still intractable and we need a heuristic to obtain a \good"
Pareto optimal solution for the multi objective optimization
problem.</p>
      <p>We thus need to construct an optimization framework that
integrates an exploration mechanism with a learning to rank
paradigm.
4.3</p>
    </sec>
    <sec id="sec-6">
      <title>Discovery Framework</title>
      <p>One way of constructing a framework that optimizes an
e-com site for multiple business objectives along with an
exploration option can be to use a xed number of rank
positions say x for exploration and use the rest of the top
(K x) positions for exploitation. In this set-up, we can
construct an explore set of items E where all the items have
been shown less than times. We then select top x
candidates for a query qi from the intersection of the recall set of
items for the query and the set E. Let's call this set
corresponding to query qi as Ei. The items selected for the rest
of the (K x) positions can be based on a suitable
multiobjective LTR ranking function. This is a framework that
is practical and the cost of the exploration can be estimated
by the loss of revenue for selecting not the most optimal
items for x positions. Note, it is also possible to select any
rank positions to show the explored items. However, in this
paper, let us assume that we keep those x items at top for
keeping our framework simple to analyze.
4.4</p>
    </sec>
    <sec id="sec-7">
      <title>LTR and Discoverability</title>
      <p>Now that we have a formal model for the e-commerce
search optimization problem, we want to establish the
following point:</p>
      <p>A LTR function is not optimal for discoverability
criteria.</p>
      <p>In an e-com site, only top K items are shown to the
customers. Suppose there is an item that is
relevant to a set of queries Q. Now, assume that for each
query q 2 Q there are K other relevant old items for
which f (X2)) &gt; 0. Assuming that the K items are
equally relevant, we can say that the has same value
of f (X1) compared to the k old items. Then, will
always rank below the top K items for the set of queries
in Q. Suppose the current number of impressions for
is less than , then it has no chance of receiving
impression under the ranking framework using function f
because it will always be ranked below the top K items
unless there is new query that ranks this item higher.
This means that as long as we can nd a percentage of
items that receives below impressions, it is hard to
improve the -discoverability using a traditional LTR
mechanism without any integrated discovery
mechanism.</p>
    </sec>
    <sec id="sec-8">
      <title>DISCOVERY STRATEGIES</title>
      <p>In this section, we propose several strategies to select the x
items for discovery that aims to minimize the loss of business
metrics.
5.1</p>
    </sec>
    <sec id="sec-9">
      <title>Auction based strategy</title>
      <p>
        In this strategy, the items for top x positions can be
assigned based on an auction strategy where sellers can bid
for the positions. This strategy has been used in sponsored
search for search engine companies such as Google and in
sponsored product for e-commerce company such as
Amazon. Design of such auction mechanism that balances the
revenue and relevance is a complex subject and has been
discussed in recent papers by Ghose et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Athey at
al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, many e-commerce businesses may not want
to use auction strategy to aid discovery since it can hurt the
growth of seller participation on those sites.
5.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Exploration with LTR (eLTR)</title>
      <p>
        This is a strategy discussed in a recent paper by Goswami
et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In this strategy, we de ne the set from which
the LTR function selects the items as Li Ri for a given
query qi. We assume that all the items outside set Li are
not -discoverable. Then, L = [ii==1N Li is the set of all
discoverable items. Hence, the set E = R n L can then be
consisting of all the items that require exploration.
      </p>
      <p>
        Goswami et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] discuss three strategies to incorporate
discovery in an e-commerce search.
      </p>
      <p>Random selection based exploration from the
recall set (RSE): This is a baseline strategy for continuous
exploration with a LTR algorithm. In this, for every query
qi, we randomly select x items from the set E \Ri. Then, we
put these x items on top of the other (k x) items that are
selected using LTR from the set Ri. The regret here will be
linear with the number of search sessions with exploration.</p>
      <p>Upper con dence bound (UCB) based exploration
from the recall set (UCBE): This is another simple
strategy that uses a variant of UCB based algorithm for
exploration instead of random sampling. Here, we maintain a
MAB for each query. We consider each item in the set
E \ Ri as an arm for the MAB corresponding to a query
qi. We maintain an UCB score for each of those items based
on sales over impression for the query. If an item j is in
the set E \ Ri and is shown bj times in T iterations, and is
sold aj times in between, then the UCB score of the item j
is ucbj = abjj + q 2 log2 T +1 . Note, this is for a speci c query.</p>
      <p>bj
We then select x items based on top UCB scores.</p>
      <p>Explore LTR (eLTR In this, we de ne a function that
we call explore LTR (eLTR) to select the x items. The rest
of the items for top K can be chosen using the traditional
LTR. Then, we can either keep the x items on top or we can
rerank all K items based on eLTR.</p>
      <p>The main motivation for the eLTR is the observation that
there is inherent over tting in the regular ranking function
used in an e-com search engine that hinders exploration,
i.e., hinders improvement of -discoverability and STR. The
over tting is mainly caused by a subset of features derived
from user interaction data. Such features are very important
as they help inferring a user's preferences, and the over
tting is actually desirable for many repeated queries and for
items that have sustained interests t users (since they are
\test cases" that have occurred in the training data), but
it gives old items a biased advantage over the new items,
limiting the increase of -discoverability and STR. Thus the
main idea behind e-LTR is thus to separate such features
and introduce a parameter to restrict their in uences on the
ranking function, indirectly promoting STR. Formally, we
note that in general, a ranking function can be written in
the following form:</p>
      <p>y = f (X) = g(f1X1); f2(X2))
where y 2 R denotes a ranking score and X 2 RN is a
N dimensional feature vector, X1 2 RN 1 and X2 2 RN 2
are two di erent groups of features such that N 1 + N 2 =
N; X1 [ X2 = X. The two groups of features are meant to
distinguish features that are unbiased (e.g., content
matching features) from those that are inherently biased (e.g.,
clickthrough-based features). Here g is an aggregation
function which is monotonic with respect to both arguments. It
is easy to show that any linear model can be written as a
monotonic aggregation function. It is not possible to use
such representation for models such as additive trees.
However, our previous techniques do not have such limitation
since they are completely separated from the LTR. In this
paper, we keep our discussion limited to linear models. We
now de ne explore LTR (eLTR) function as follows:
ye = fe(X) = g(f (X1);
f (X2))
where ye 2 R and 0 1 is a variable in our algorithmic
framework. Since, g is monotonic, fe(X) &lt;= f (X) when
1. Since feature set X2 is a biased feature set favoring
old items, we can expect ranking based on f e would be more
in favor of new items in comparison with the original f ,
achieving the goal of emphasizing exploration of new items.
Note that controls the amount of exploration: the smaller
is, the more exploration (at the cost of exploitation). Since
the maximum exploration is achieved when =0, in which
case, ranking is entirely relying on f1, the only loss in the
original objective function is incurred by the removal of f2.
By controlling what features to be included in f2, we can
control the upper bound of the loss. In this sense, eLTR
ensures a \safe" exploration strategy since f1 is always active.
Note, this composed function gradually becomes the LTR
function when tends to 1. There can be various ways of
constructing the such as the following:
eLTR basic exploration (eLTRb): In this strategy,
we keep = TmIax . Here, I is an iteration and Tmax is a
maximum number of iteration after which everything can
be reset. This is a very simple strategy where the eLTR just
increases the importances of the behavioral features
gradually with every iteration.</p>
      <p>eLTR ucb weighted exploration (eLTRu): In this
strategy, we keep = uUcbjj . Here, Uj is a normalization
factor and in our experiment it is chosen to be the maximum
UCB score in the set E \ Ri.</p>
      <p>eLTR ucb weighted exploration and reranking
(eLTRur): This strategy rst selects the top x items using
eLTRu and it selects the remaining (k x) items using the
classic LTR and then it reranks the k items using eLTRu.</p>
      <p>
        Goswami et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] show in their paper that eLTR can
be considered as a monotonic sub-modular function and can
thus be approximated by a greedy algorithm.
5.3
      </p>
      <p>Adaptive submodular function based eLTR
(AeLTR)</p>
      <p>
        This is an extension of the eLTR mechanism presented by
Goswami et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However, it makes the strategy much
more general by allowing us to use any LTR rank function
going beyond the limitation of linear LTR models. In this,
we use the following function:
      </p>
      <p>ya = fa(X) = af (X)</p>
      <sec id="sec-10-1">
        <title>Here, the a is a function of UCB as follows:</title>
        <p>
          a =
ucbj
Uj
This algorithm has been discussed in a paper by Gabillon
et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and is proven to be also a monotonic sub-modular
function. Hence, a simple greedy algorithm can give a nice
approximation also for this. We however do not know how
this strategy can be compared in terms of losses and the
discovery metric with respect to the other approaches that
we have discussed in this paper.
5.4
        </p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Regret of the eLTR strategies</title>
      <p>The RSE function has linear regret since it is random. The
regret for this can be arbitrarily bad and can thus be given
by O(Rl). The regret for the UCBE algorithm can be given
by O(log (Rl)). Both eLTR and AeLTR have similar regret
because they both uses monotonic sub-modularity property.
These algorithms pull items using a score that is a
function of the LTR score and are expected to provide a better
loss compared to the approaches that are based on
multiarmed bandit algorithms. However, it is not immediately
clear how these algorithms can optimize the discoverability
objective. The regret of the eLTR group of algorithms can
be estimated by O(1+1=e jEj</p>
      <p>x ) times worst compared to the
optimal which comes from their monotonic sub-modularity
property. The eLTR algorithms can be considered to be
similar to the -greedy approaches where the regret bound is not
better than the UCB algorithm. However, in our simulation,
we want to study the discoverability metrics for such eLTR
algorithms.</p>
    </sec>
    <sec id="sec-12">
      <title>EVALUATION METHODOLOGY</title>
      <p>Due to the involvement of multiple objectives, the
evaluation of E-Com search algorithms also presents new
challenges. Here we discuss some ideas for evaluating the
proposed e-LTR algorithm, which we hope will stimulate more
work in this direction.</p>
      <p>Given a set of queries and an initial ranking function, an
e-LTR method is expected to \discover" the ideal ranking
for each query, over a sequence of iterations. Ideal ranking
may be de ned as one that maximizes business metrics like
sales, revenue etc. Since exploration is involved, one can
expect the quality of ranking (N DCG) and consequently
sales/revenue to uctuate for individual queries during the
discovery process. However, the bene t of an e-LTR method
lies in its ability to deliver greater aggregate sales/revenue
over some reasonable number of iterations N , compared to
a non-discovery LTR method. We must thus compare our
methods on the iterative growth in quality of ranking, and
aggregate gain in business metrics.</p>
      <p>The ideal approach for conducting such an evaluation would
require simultaneously deploying all candidate methods to
live user tra c, and computing various user engagement
metrics such as click through rate,sales, revenue etc.
However this strategy is di cult to implement in practice. Since
user tra c received by each candidate method is di erent,
we need to direct substantial amount of tra c to each method
to make observations comparable and conclusions
statistically signi cant. Deployment of a large number of
experimental and likely sub-optimal ranking functions, especially
when evaluating baselines, can result in signi cant business
losses for e-Commerce search engines. More importantly,
such an approach of using live user tra c would not allow
us to have a fair comparison of any future algorithm with
the ones that we test today, so it does not facilitate further
study of the problem.</p>
      <p>
        Perhaps a good and feasible strategy is to design a
simulationbased counterfactual evaluation method [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Here, unlike a
traditional static test collection, we also have to introduce
models to simulate user behavior and model the biases
observed in real system. This can be a topic itself and we omit
further discussion on this as it is out of scope for this short
paper.
7.
      </p>
    </sec>
    <sec id="sec-13">
      <title>EXPERIMENTAL RESULTS</title>
      <p>In this section, we rst construct a synthetic historical
dataset with queries, items and their prices. We also
generate the true purchase probabilities and utility scores for the
item and query pairs. Additionally, we use a speci c rank
function to simulate the behavior of a trained LTR model.</p>
      <p>Then we conduct a simulation as described in section ??
with various exploration strategies. During the simulation
we use the observed purchase probabilities estimated from
the purchase feedback as the most important feature for the
rank function but we use the true probabilities generated
during the initial data generation phase to simulate the user
behavior.</p>
      <p>The main goal of this experimental study is to evaluate
the behavior of the exploration strategies with di erent
distributions of the utility scores representing di erent state of
the inventory in an e-com company.</p>
      <p>
        We evaluate our algorithms by running the simulation for
T times. We compute RPV and -discoverability at the end
of T iterations. We also compute a purchase based mean
reciprocal ranking [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] metric (MRR). This metric is computed
by summing the reciprocal ranks of all the items that are
purchased in various user visits for all queries. Moreover,
we also discretize our gold utility score between 1 to 5 and
generate a rating for each item. This also allows us to
compute a mean NDCG score at k-th position for all the search
sessions as a relevance metric.
      </p>
      <p>We expect to see that the RPV and NDCG of the LTR
function will be the best. however the -discoverability
values will be better in ranking policies that use an exploration
strategy. The new ranking strategies will incur loss in RPV
and NDCG and based on our theoretical analysis we
expect the eLTR methods to have less loss compared to the
RSE and UCB based approaches in those measures. We
also expect to see a loss in MRR for all exploration
methods. However, we mainly interested in observing how these
algorithms perform in -discoverability metric compared to
LTR.
7.1</p>
    </sec>
    <sec id="sec-14">
      <title>Synthetic data generation</title>
      <p>We rst generate a set of N queries and M items. We
then assign the prices of the items by randomly drawing a
number between a minimum and a maximum price from a
multi-modal Gaussian distribution that can have up to 1 to
10 peaks for a query. We select the speci c number of peaks
for a query uniform randomly. We also assign a subset of
the peak prices to a query to be considered as the set of it's
preferred prices. This makes a situation where every query
may have a few preferred price peak points where it may
also have the sales or revenue peaks.</p>
      <p>
        Now that we have the items and queries de ned, we
randomly generate an utility score, denoted by (uij ) for every
item j for a query qi. In our set up, we use uniform
random, Gaussian and a long tailed distribution for selecting
the utilities. These three di erent distributions represent
three scenarios for a typical e-com company's inventory.
Additionally, we generate a purchase probability between 0:02
to 0:40 for every item for every query. We generate these
probabilities such that they correlates with the utility score.
We generate these numbers in a way so that these are
correlated with the utility scores with a statistically signi cant
(p-value less than 0:10) Pearson correlation coe cient [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ].
We also intend to correlate the purchase probability with
the preferred peak prices for a query. Hence, we give an
additive boost between 0 to 0:1 to the purchase probability
in proportion to the absolute di erence of the price of the
item from the closest preferred mean price for that query.
By generating the purchase probabilities in this way, we
ensure that the actual purchase probabilities are related to the
preferred prices for the queries, and also it is related to the
utility scores of the items for a given query. Now, we
dene a -discoverability rate = b and selects b M items
randomly from the set of all items. In our simulation, we
assume that the estimated (observed) purchase probability
for all the items in set E at the beginning can be zero. The
rest of the items purchase probability are assumed to be
estimated correctly at the beginning. Now, we create a
simple rank function that is a weighted linear function of the
utility score (u), the observed purchase probability (po), and
the normalized absolute di erence between the product price
and the closest preferred mean price ( m^p for the query such
that l = 0:60po + 0:20u + 0:20 m^p. Here l denotes the score
of the ranker. This ranker simulates a trained LTR function
in e-com search where usually the sales can be considered
the most valuable behavioral signal.
      </p>
      <p>
        We now construct an user model. Here, an user browses
through the search results one after another from the top
and can purchase an item based on that item's purchase
probability for that query. Note, in order to keep the
simulation simple, we consider an user only purchases one item
in one visit and leaves the site after that. We also can
apply a discount to the probability of purchase logarithmically
for each lower rank by multiplying log2(1r+1) , where r is the
ranking position of the item. This is to create an e ect of
the position bias [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
7.2
      </p>
    </sec>
    <sec id="sec-15">
      <title>Description of the experimental study</title>
      <p>We conduct two sets of experiments with this simulated
data.</p>
      <p>In the rst set of experiments we use a Gaussian
distribution with mean 0:5 and the variance 0:1 for generating
the utility scores, but everything else is same as the
previous experiment. This means that the utility of the items
in the inventory follows a normal distribution. We generate
the scores for all the products from a Gaussian distribution
with mean 0:5 and the variance 0:1. The table 1 shows the
tables with nal metrics for all the algorithms. We observe
that with a Gaussian distribution of utility scores the eLTR
approaches have better MRR, and -discoverability. We
observe that the AeLTR approach works well by producing
good MRR and yet the -discoverability is better than LTR.
This is a powerful result since it shows that it is possible to
integrate this with more sophisticated LTR algorithms</p>
      <p>In the second set of experiment, we use a power law to
generate the utility distribution. This means that only a
small set of items here can be considered valuable in this
scenario. The table 2 shows the nal metrics for this case.
We notice that even with this distribution of utility scores
the eLTR variants have smaller loss in RPV, NDCG, and
in MRR. Note that in this distribution, the discoverability
can be considered to be naturally not so useful since a large
number of items are not that valuable. We expect in such
situation, a nice discoverability algorithm can help to
eliminate items that do not get sold after su cient exposure and
enable the e-com company to optimize it's inventory. We
observe a very similar trend for AeLTR algorithm as our
previous experiment. It gives better RPV and MRR close
to the LTR algorithm but it has less -discoverability
compared to eLTR approaches.</p>
      <p>CONCLUSIONS AND FUTURE WORK
This paper represents a rst step toward formalizing the
emerging new E-Com search problem as an optimization
problem with multiple objectives including the sales(CRV),
revenue (RPV), and discoverability besides relevance. We
formally de ne these objectives and discuss multiple
strategies for conducting exploration with the learning to rank
algorithms. We hope that our work will open up many new
directions in research for optimizing e-com search. The
obvious next step is to empirically validate the strategies based
on log data from an e-com search engine. The
theoretical framework also enables many interesting ways to further
formalize the e-com search problem and develop new e
ective e-com search algorithms. Finally, the proposed
discovery framework is just a small step toward solving the new
problem of optimizing discoverability in e-com search; it is
important to further develop more e ective algorithms by
using these algorithms as baselines.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Athey</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Nekipelov</surname>
          </string-name>
          .
          <article-title>A structural model of sponsored search advertising auctions</article-title>
          .
          <source>In Sixth ad auctions workshop</source>
          , volume
          <volume>15</volume>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cesa-Bianchi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          .
          <article-title>Finite-time analysis of the multiarmed bandit problem</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>47</volume>
          (
          <issue>2-3</issue>
          ):
          <volume>235</volume>
          {
          <fpage>256</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shaked</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Renshaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lazier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deeds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hullender</surname>
          </string-name>
          .
          <article-title>Learning to rank using gradient descent</article-title>
          .
          <source>In Proceedings of the 22nd international conference on Machine learning</source>
          , pages
          <volume>89</volume>
          {
          <fpage>96</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Burges</surname>
          </string-name>
          .
          <article-title>From ranknet to lambdarank to lambdamart: An overview</article-title>
          .
          <source>Learning</source>
          ,
          <volume>11</volume>
          (
          <fpage>23</fpage>
          -581):
          <fpage>81</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ragno</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          .
          <article-title>Learning to rank with nonsmooth cost functions</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>193</volume>
          {
          <fpage>200</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          .
          <article-title>Mean reciprocal rank</article-title>
          .
          <source>In Encyclopedia of Database Systems</source>
          , pages
          <fpage>1703</fpage>
          {
          <fpage>1703</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zoeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ramsey</surname>
          </string-name>
          .
          <article-title>An experimental comparison of click position-bias models</article-title>
          .
          <source>In Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM '08</source>
          , pages
          <fpage>87</fpage>
          {
          <fpage>94</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Friedman</surname>
          </string-name>
          .
          <article-title>Greedy function approximation: a gradient boosting machine</article-title>
          .
          <source>Annals of statistics</source>
          , pages
          <volume>1189</volume>
          {
          <fpage>1232</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gabillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kveton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eriksson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Muthukrishnan</surname>
          </string-name>
          .
          <article-title>Adaptive submodular maximization in bandit setting</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          , pages
          <fpage>2697</fpage>
          {
          <fpage>2705</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghose</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>An empirical analysis of search engine advertising: Sponsored search in electronic markets</article-title>
          .
          <source>Management Science</source>
          ,
          <volume>55</volume>
          (
          <issue>10</issue>
          ):
          <volume>1605</volume>
          {
          <fpage>1622</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gittins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Glazebrook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Weber</surname>
          </string-name>
          <article-title>. Multi-armed bandit allocation indices</article-title>
          . John Wiley &amp; Sons,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Goswami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohapatra</surname>
          </string-name>
          .
          <article-title>Learning to rank and discover for e-commerce search</article-title>
          .
          <source>In International Conference on Machine Learning and Data Mining in Pattern Recognition.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hang</surname>
          </string-name>
          .
          <article-title>A short introduction to learning to rank</article-title>
          .
          <source>IEICE TRANSACTIONS on Information and Systems</source>
          ,
          <volume>94</volume>
          (
          <issue>10</issue>
          ):
          <year>1854</year>
          {
          <year>1862</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ja</surname>
          </string-name>
          <article-title>rvelin and</article-title>
          <string-name>
            <surname>J. Keka</surname>
          </string-name>
          <article-title>lainen. Ir evaluation methods for retrieving highly relevant documents</article-title>
          .
          <source>In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <volume>41</volume>
          {
          <fpage>48</fpage>
          . ACM,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>Optimizing search engines using clickthrough data</article-title>
          .
          <source>In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>133</volume>
          {
          <fpage>142</fpage>
          . ACM,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Karmaker Santu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sondhi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          .
          <article-title>On application of learning to rank for e-commerce search</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '17</source>
          , pages
          <fpage>475</fpage>
          {
          <fpage>484</fpage>
          , New York, NY, USA,
          <year>2017</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>A short introduction to learning to rank</article-title>
          .
          <source>IEICE TRANSACTIONS on Information and Systems</source>
          ,
          <volume>94</volume>
          (
          <issue>10</issue>
          ):
          <year>1854</year>
          {
          <year>1862</year>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleban</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          .
          <article-title>Counterfactual estimation and optimization of click metrics in search engines: A case study</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web</source>
          , pages
          <volume>929</volume>
          {
          <fpage>934</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>T</surname>
          </string-name>
          .-Y. Liu.
          <article-title>Learning to rank for information retrieval</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <volume>225</volume>
          {
          <fpage>331</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mahajan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Teneketzis</surname>
          </string-name>
          <article-title>. Multi-armed bandit problems</article-title>
          .
          <source>In Foundations and Applications of Sensor Management</source>
          , pages
          <volume>121</volume>
          {
          <fpage>151</fpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sorokina</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Cantu-Paz</surname>
          </string-name>
          .
          <article-title>Amazon search: The joy of ranking products</article-title>
          .
          <source>In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16</source>
          , pages
          <fpage>459</fpage>
          {
          <fpage>460</fpage>
          , New York, NY, USA,
          <year>2016</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>K. M. Svore</surname>
            ,
            <given-names>M. N.</given-names>
          </string-name>
          <string-name>
            <surname>Volkovs</surname>
            , and
            <given-names>C. J.</given-names>
          </string-name>
          <string-name>
            <surname>Burges</surname>
          </string-name>
          .
          <article-title>Learning to rank with multiple objective functions</article-title>
          .
          <source>In Proceedings of the 20th international conference on World wide web</source>
          , pages
          <volume>367</volume>
          {
          <fpage>376</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vermorel</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohri</surname>
          </string-name>
          <article-title>. Multi-armed bandit algorithms and empirical evaluation</article-title>
          .
          <source>In European conference on machine learning</source>
          , pages
          <volume>437</volume>
          {
          <fpage>448</fpage>
          . Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Wilcox</surname>
          </string-name>
          .
          <article-title>Introduction to robust estimation and hypothesis testing</article-title>
          . Academic press,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Broder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          .
          <article-title>The k-armed dueling bandits problem</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          ,
          <volume>78</volume>
          (
          <issue>5</issue>
          ):
          <volume>1538</volume>
          {
          <fpage>1556</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>