Towards Optimization of E-Commerce Search and Discovery Anjan Goswami ChengXiang Zhai Prasant Mohapatra University of California, Davis University of Illinois University of California, Davis agoswami@ucdavis.edu Urbana-Champaign, IL pmohapatra@ucdavis.edu czhai@illinois.edu ABSTRACT Keywords E-Commerce (E-Com) search is an emerging problem with e-com search, retrieval models, learning to rank, discover- multiple new challenges. One of the primary challenges ability, exploration-exploitation constitutes optimizing multiple objectives involving business metrics such as sales and revenue and maintaining a discov- 1. INTRODUCTION ery strategy for the site. In this paper, we formalize the One of the most critical components of an e-commerce (e- e-com search problem for optimizing metrics based on sales, com) marketplace is its search functionality. The goal of an revenue, and relevance. We define a notion of item discover- e-commerce search engine is to help the buyers to find and ability in search and show that learning to rank (LTR) algo- purchase their desirable products. The buyers’ intent repre- rithms trained with behavioral features from e-com customer sents the demand side and the products listed by the sellers interactions (eg. clicks,cart-adds, orders etc.) do not by represent the supply side of an e-com platform. The e-com themselves address the discoverability problem. Instead, a search engine’s function can be considered to provide an ef- suitable explore-exploit framework must be integrated with ficient matching platform for the supply and the demand the ranking algorithm. We thus construct a practical dis- side and to facilitate transactions to generate revenue. Like covery strategy by keeping a few top positions for discovery a web search engine, an e-com search also intends to ensure and populating some of the items selected through explo- user satisfaction by presenting the most relevant items that ration. Then, we present a few exploration strategies with a user is searching for. However, an e-com search engine low regret bounds in terms of business metrics. We conduct generates revenue directly based on the products presented a simulation study with a synthetically generated dataset to the users; this is in sharp contrast with how a web search that represents items with different utility distribution and engine generates revenue through advertising and raises sig- compares these strategies using metrics based on sales, rev- nificant new challenges for designing effective search algo- enue, relevance, and discovery. We find that a strategy based rithms. Thus besides the usual goal of maximizing relevance on adaptive submodular function based discovery framework of the search results, an e-com search engine also needs to can provide a nice balance of business metrics and discov- optimize two additional goals that generally do not exist erability compared to other strategies based on random ex- in web search scenarios, i.e., it has to 1) maximize the busi- ploration or multi-armed bandit. However, another strategy, ness metrics such as revenue or sales from the purchases and based on monotonic submodular optimization function that 2) minimize the inventory cost by selling the items faster. needs to be integrated with linear LTR models also works Both the objectives are dependent on the performance of well for discovery and has nice performances with respect to the underlying ranking algorithm of the search engine other sales, revenue, and relevance. than the assortment selection and demand generation. The first set of goals of maximizing the business metrics can be Categories and Subject Descriptors achieved by constructing a rank function using learning to rank framework with a suitable optimization objective that H.3.3 [Information Search and Retrieval]: E-Com Search is in line with the business goals of the e-commerce com- pany. The second goal however requires the e-commerce sites to have a strategy of discovery so that it can expose General Terms as many items to the customers faster and determine what Algorithms products are selling. This allows the companies to readjust their assortment strategy to optimize the inventory and as- sociated costs. The discovery strategies however, can hurt the business goals since it requires showing previously unex- plored items to the customers. In e-commerce, learning to Permission to make digital or hard copies of all or part of this work for rank functions are trained by constructing mainly two types personal or classroom use is granted without fee provided that copies are of features: (a) features that arise from static contents of not made or distributed for profit or commercial advantage and that copies the product description (b) features that come from the be- bear this©notice Copyright 2018 by and the full the paper’s citation authors. onpermitted Copying the firstforpage. privateTo andcopy otherwise, academic purposes. to havioral signals such as clicks, cart-adds, sales, and review In: J. Degenhardt, republish, G. Di to post onFabbrizio, servers or S. Kallumadi, M. Kumar, to redistribute Y.-C. requires to lists, Lin, A. Trotman, H. Zhao prior specific (eds.): Proceedings of the SIGIR 2018 eCom workshop, 12 July, 2018, Ann Arbor, Michigan, USA, permission and/or a fee. ratings. It has been observed [16, 21] that the second group published at http://ceur-ws.org Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. of features is more useful for finding a comparative relevance among the items. However, the use of the machine learning literatures. The algorithms that are commonly used for ex- algorithms in search engines tends to bias a search engine ploration include -greedy [23], and variants of upper con- toward favoring the viewed items by users due to the use of fidence bound (UCB) algorithm [2]. Additionally, k-armed features that are computed based on user behavior such as dueling bandit framework [25] has been developed for find- clicks, rates of “add to cart”, etc. Since a learning algorithm ing out optimal bandit in a scenario where each bandit can would rank items that have already attracted many clicks be considered to optimize a specific objective. It is possible on the top, it might overfit to promote the items viewed by to frame a multiobjective problem with exploration compo- users. As a result, some items might never have an oppor- nent using such paradigm. In this paper, we aim to construct tunity to be shown to a user (i.e., “discovered” by a user), a framework that is simpler than K-armed bandit but more thus also losing the opportunity to potentially gain clicks. practical. Such “undiscovered” products would then have to stay in the inventory for a long time incurring extra cost and hurting satisfaction of the product providers. Hence, it is important 3. OPTIMIZATION OF E-COM SEARCH to consider the integrating the aspect of discovery with the We consider the problem of optimization of an E-Com ranking mechanism for e-commerce search. In this paper, search engine over a period of time {1, · · · , T }. Let us as- we provide a practical yet theoretically sound framework for sume that there are N queries, denoted by Q = (q1 , q2 , · · · , qN ), learning to rank where it is possible to have a regulated dis- that can be sent to the search engine during this time. Let covery mechanism minimizing the loss in revenue, sales, or Z = {ζ1 , · · · , ζM } be the set of M items and let the corre- relevance metrics. sponding prices of the items be given by {ρ1 , · · · , ρM }. Now, The key contributions of this paper can be described as consider a rank function π and π(qi , t) outputs top K items follows: which is a permutation from the set Z. Now, let us consider (1) This is the first paper that formalizes an e-com search two binary random variables ΛS C ijt , Λijt where the first one de- problem as a multi-objective LTR problem with continuous notes if a sale happens or not and the second one denotes if exploration that we call learning to rank and discover prob- there is a click or not for an item ζj given a query qt at time t. lem. (2) We discuss how we can construct a solution to this Now, given that we have these two binary random variables, problem using the existing framework of multi-armed ban- we can define the probability of sales given a query qi , item dit and auction mechanism. (3) We also provide a novel ζj at time t as p(ΛS ijt = 1|qi , ζj , t) and similarly probability paradigm combining algorithms from multi-objective opti- of click for the same can be defined as p(ΛC ijt = 1|qi , ζj , t). mization and explore and exploit paradigms. Let’s also create a normalization constant Γ that denotes the total number of search sessions during the time period 2. BACKGROUND we are considering to conduct this study. Now, we can define several objective functions for the e- 2.1 LTR and Ecommerce Search commerce search such as the following: Conversion rate per visit (CRV): Learning to rank (LTR) algorithms [19] involve learning a function for optimizing a relevance measure using an offline t=T i=N j=M 1 XX X labeled training data set. This function can be a neural net, gCRV = p(ΛS ijt = 1|qi , ζj , t) boosted tree [3, 22], support vector machine [15] or linear Γ t=1 i=1 j=1 models [13] etc. The common ranking measures [14] include normalized discounted cumulative gain (NDCG), mean re- Revenue per visit (RPV): ciprocal ranking (MRR), mean average precision (MAP) t=T i=N j=M etc. The loss functions based on these ranking measures 1 XX X gRP V = p(ΛS ijt = 1|qi , ζj , t)ρj are mostly non-smooth, however researchers [5, 17] derive Γ t=1 i=1 j=1 LTR algorithms such as LambdaMart [4] to address such problems. There is also a work on multi-objective learning We now also define a relevance based objective as follows: to rank [22] that uses a click rate along with the human t=T i=N judged ratings to augment the computation of gradient di- 1 XX gREL (π) = RM (π(qi , t)) rection from pairwise training data in LambdaMart algo- Γ t=1 i=1 rithm. This algorithm can potentially be modified to learn to maximize multiple graded objectives where the priorities where RM can be any relevance measure such as normal- are known in advance. There are not too many papers on ized discounted cumulative gain (NDCG), which is gener- LTR algorithms applied to e-com data. Recently, researchers ally defined based on how well the ranked list π(q) matches from Walmart [16] have conducted some experiments with the ideal ranking produced based on human annotations of multiple LTR algorithms with e-com data. In their obser- relevance of each item to the query. The aggregation func- vation, LambdaMart algorithm with a sales and click rate tion does not have to be a sum (over all the queries); it based objective worked well for their data set. In a sepa- can also be, e.g., the minimum of relevance measure over all rate paper, researchers from Amazon [21] reported use of the queries, which would allow us to encode the preference gradient boosted tree based regression methods [8] for LTR for avoiding any query with a very low relevance measure application. They also reported using various normalization score. Note that it is possible to define the NDCG based on techniques to avoid the biases generated due to the heavy ratings created from direct user feedback such as clicks, add- reliance of historical behavioral data in regression. to-cart, or sales, since comparing items based on relevance in e-commerce can be hard [21]. In this paper, we assume 2.2 Online learning and Multi-armed bandit that the relevance function uses a rating system without Multi-armed bandit (MAB) algorithms [11, 20] have a rich any specific notion of how that rating system is constructed. With this generality, we can define the following objective approach is that we can directly build on the existing LTR for relevance: framework, though the new objective function would pose new challenges in designing effective and efficient optimiza- gREL (π) = mini∈[1,N ],j∈[1,T ] RM (π(qi , t)) tion algorithms to actually compute optimal rankings. One Now,we define a notion of discovery for an e-com engine. disadvantage of this strategy is that we cannot easily control To formalize the notion of discoverability, we say that the the tradeoff between different objectives (e.g., we sometimes LTR function f is β-discoverable if all items are shown at may want to set a lower bound on one objective rather than least β times. Now, we can further define a β-discoverability to maximize it). To address this limitation, we may allow rate as the percentage of items that are impressed at least some objectives to be treated as constraints. β times in a fixed period of time. Let us now define again a binary variable γi for every item ζi and then assume that 4.2 Incremental Optimization of e-Com Search γi = 1 if the item got shown in the search results for β times An alternative strategy is to take an existing LTR ranking and γi = 0 in case the item is not shown in the search results function as a basis and seek to improve the ranking (e.g., by more than β times. We can express this as follows: perturbation) so as to optimize multiple objectives as de- Pi=|Z| scribed above; such an incremental optimization strategy is i=1 γi more tractable as we will be searching for solutions to the op- gβ−discoverability = |Z| timization problem in the neighborhood of an existing rank- ing function. Specifically, we instantiate the optimization We can further define another objective sales through rate problem discussed in the previous section by assuming that (STR) to denote the number of items sold in a period of time the engine incrementally perturbs a learning to rank func- as percentage of the inventory. This objective represent the tion slightly in a way that the original relevance does not efficiency of selling items faster. Let us define again a binary vary too much and then each function π = (f1 , ..., fT ) at variable λs for every item ζs and then assume that λs = 1 time step t = 1 to t = T gradually improves upon multiple if the item got sold and λs = 0 in case the item is not sold. objectives. Intuitively, we start with a function f and gen- Thus, the STR objective function when using policy π can erate the next function in the “neighborhood” of an existing be written as follows: ranking function f within a relevance bound and we take i=|Z| X each step in a way that it attempts to improve the sales and gST R (π) = λi revenue with high probability and it also tries to achieve bet- i=1 ter β-discoverability and STR for the items in its inventory. Now, we define the objective of an e-com search engine as However, the search space for all such ranking functions is the following: still intractable and we need a heuristic to obtain a “good” Pareto optimal solution for the multi objective optimization maximize gCRV , gRPV , gREL , gβ−discoverability , gSTR problem. We thus need to construct an optimization framework that This is a multi-objective optimization and has an explo- integrates an exploration mechanism with a learning to rank ration component in the form of discovery objective. The paradigm. priorities of these objectives can be business dependent. For example, a business aiming to maximize the gross revenue 4.3 Discovery Framework can prioritize maximizing revenue and on the other hand, a One way of constructing a framework that optimizes an business aiming to improve sales growth may want to maxi- e-com site for multiple business objectives along with an mize the conversions. Similarly, a business that depends on exploration option can be to use a fixed number of rank periodically bringing new items in the inventory may want positions say x for exploration and use the rest of the top to prioritize the discovery objective along with sales and rev- (K − x) positions for exploitation. In this set-up, we can enue. In most real scenarios, such business priorities come construct an explore set of items E where all the items have from the strategic directions of an e-commerce platform. been shown less than β times. We then select top x candi- We now discuss the strategies for solving this optimization dates for a query qi from the intersection of the recall set of problem. items for the query and the set E. Let’s call this set corre- sponding to query qi as Ei . The items selected for the rest 4. STRATEGIES FOR SOLVING THE OP- of the (K − x) positions can be based on a suitable multi- TIMIZATION PROBLEM objective LTR ranking function. This is a framework that is practical and the cost of the exploration can be estimated Since there are multiple objectives to optimize, it is impos- by the loss of revenue for selecting not the most optimal sible to directly apply an existing Learning to Rank (LTR) items for x positions. Note, it is also possible to select any method to optimize all the objectives. However, there are rank positions to show the explored items. However, in this multiple ways to extend an LTR method to solve the prob- paper, let us assume that we keep those x items at top for lem as we will discuss below. keeping our framework simple to analyze. 4.1 Direct extension of LTR 4.4 LTR and Discoverability One simple strategy is to combine the multiple objectives Now that we have a formal model for the e-commerce with appropriate weighting parameters so as to form one search optimization problem, we want to establish the fol- single objective function, which can then be used in a tra- lowing point: ditional LTR framework to find a ranking that would opti- mize the consolidated objective function. The weights can • A LTR function is not optimal for discoverability cri- be based on the business priorities. The advantage of this teria. on sales over impression for the query. If an item ζj is in the set E ∩ Ri and is shown bj times in T iterations, and is In an e-com site, only top K items are shown to the sold aj times inqbetween, then the UCB score of the item ζj customers. Suppose there is an item ζ that is rele- a vant to a set of queries Q. Now, assume that for each is ucbj = bjj + 2 logb2jT +1 . Note, this is for a specific query. query q ∈ Q there are K other relevant old items for We then select x items based on top UCB scores. which f (X2)) > 0. Assuming that the K items are Explore LTR (eLTR In this, we define a function that equally relevant, we can say that the ζ has same value we call explore LTR (eLTR) to select the x items. The rest of f (X1) compared to the k old items. Then, ζ will al- of the items for top K can be chosen using the traditional ways rank below the top K items for the set of queries LTR. Then, we can either keep the x items on top or we can in Q. Suppose the current number of impressions for rerank all K items based on eLTR. ζ is less than β, then it has no chance of receiving im- The main motivation for the eLTR is the observation that pression under the ranking framework using function f there is inherent overfitting in the regular ranking function because it will always be ranked below the top K items used in an e-com search engine that hinders exploration, unless there is new query that ranks this item higher. i.e., hinders improvement of β-discoverability and STR. The This means that as long as we can find a percentage of overfitting is mainly caused by a subset of features derived items that receives below β impressions, it is hard to from user interaction data. Such features are very important improve the β-discoverability using a traditional LTR as they help inferring a user’s preferences, and the overfit- mechanism without any integrated discovery mecha- ting is actually desirable for many repeated queries and for nism. items that have sustained interests t users (since they are “test cases” that have occurred in the training data), but 5. DISCOVERY STRATEGIES it gives old items a biased advantage over the new items, limiting the increase of β-discoverability and STR. Thus the In this section, we propose several strategies to select the x main idea behind e-LTR is thus to separate such features items for discovery that aims to minimize the loss of business and introduce a parameter to restrict their influences on the metrics. ranking function, indirectly promoting STR. Formally, we note that in general, a ranking function can be written in 5.1 Auction based strategy the following form: In this strategy, the items for top x positions can be as- signed based on an auction strategy where sellers can bid y = f (X) = g(f1 X1), f2 (X2)) for the positions. This strategy has been used in sponsored where y ∈ R denotes a ranking score and X ∈ RN is a search for search engine companies such as Google and in N dimensional feature vector, X1 ∈ RN 1 and X2 ∈ RN 2 sponsored product for e-commerce company such as Ama- are two different groups of features such that N 1 + N 2 = zon. Design of such auction mechanism that balances the N, X1 ∪ X2 = X. The two groups of features are meant to revenue and relevance is a complex subject and has been distinguish features that are unbiased (e.g., content match- discussed in recent papers by Ghose et al. [10] and Athey at ing features) from those that are inherently biased (e.g., al. [1]. However, many e-commerce businesses may not want clickthrough-based features). Here g is an aggregation func- to use auction strategy to aid discovery since it can hurt the tion which is monotonic with respect to both arguments. It growth of seller participation on those sites. is easy to show that any linear model can be written as a monotonic aggregation function. It is not possible to use 5.2 Exploration with LTR (eLTR) such representation for models such as additive trees. How- This is a strategy discussed in a recent paper by Goswami ever, our previous techniques do not have such limitation et al. [12]. In this strategy, we define the set from which since they are completely separated from the LTR. In this the LTR function selects the items as Li ⊂ Ri for a given paper, we keep our discussion limited to linear models. We query qi . We assume that all the items outside set Li are now define explore LTR (eLTR) function as follows: not β-discoverable. Then, L = ∪i=N i=1 Li is the set of all β- y e = fe (X) = g(f (X1),  × f (X2)) discoverable items. Hence, the set E = R \ L can then be consisting of all the items that require exploration. where y e ∈ R and 0 ≤  ≤ 1 is a variable in our algorithmic Goswami et al. [12] discuss three strategies to incorporate framework. Since, g is monotonic, fe (X) <= f (X) when discovery in an e-commerce search.  ≤ 1. Since feature set X2 is a biased feature set favoring Random selection based exploration from the re- old items, we can expect ranking based on f e would be more call set (RSE): This is a baseline strategy for continuous in favor of new items in comparison with the original f , exploration with a LTR algorithm. In this, for every query achieving the goal of emphasizing exploration of new items. qi , we randomly select x items from the set E ∩Ri . Then, we Note that  controls the amount of exploration: the smaller  put these x items on top of the other (k − x) items that are is, the more exploration (at the cost of exploitation). Since selected using LTR from the set Ri . The regret here will be the maximum exploration is achieved when =0, in which linear with the number of search sessions with exploration. case, ranking is entirely relying on f1 , the only loss in the Upper confidence bound (UCB) based exploration original objective function is incurred by the removal of f2 . from the recall set (UCBE): This is another simple strat- By controlling what features to be included in f2 , we can egy that uses a variant of UCB based algorithm for explo- control the upper bound of the loss. In this sense, eLTR ration instead of random sampling. Here, we maintain a ensures a “safe” exploration strategy since f1 is always active. MAB for each query. We consider each item in the set Note, this composed function gradually becomes the LTR E ∩ Ri as an arm for the MAB corresponding to a query function when  tends to 1. There can be various ways of qi . We maintain an UCB score for each of those items based constructing the  such as the following: eLTR basic exploration (eLTRb): In this strategy, lenges. Here we discuss some ideas for evaluating the pro- I we keep  = Tmax . Here, I is an iteration and Tmax is a posed e-LTR algorithm, which we hope will stimulate more maximum number of iteration after which everything can work in this direction. be reset. This is a very simple strategy where the eLTR just Given a set of queries and an initial ranking function, an increases the importances of the behavioral features gradu- e-LTR method is expected to “discover” the ideal ranking ally with every iteration. for each query, over a sequence of iterations. Ideal ranking eLTR ucb weighted exploration (eLTRu): In this may be defined as one that maximizes business metrics like ucb strategy, we keep  = Ujj . Here, Uj is a normalization sales, revenue etc. Since exploration is involved, one can factor and in our experiment it is chosen to be the maximum expect the quality of ranking (N DCG) and consequently UCB score in the set E ∩ Ri . sales/revenue to fluctuate for individual queries during the eLTR ucb weighted exploration and reranking (eL- discovery process. However, the benefit of an e-LTR method TRur): This strategy first selects the top x items using lies in its ability to deliver greater aggregate sales/revenue eLTRu and it selects the remaining (k − x) items using the over some reasonable number of iterations N , compared to classic LTR and then it reranks the k items using eLTRu. a non-discovery LTR method. We must thus compare our Goswami et al. [12] show in their paper that eLTR can methods on the iterative growth in quality of ranking, and be considered as a monotonic sub-modular function and can aggregate gain in business metrics. thus be approximated by a greedy algorithm. The ideal approach for conducting such an evaluation would require simultaneously deploying all candidate methods to 5.3 Adaptive submodular function based eLTR live user traffic, and computing various user engagement (AeLTR) metrics such as click through rate,sales, revenue etc. How- ever this strategy is difficult to implement in practice. Since This is an extension of the eLTR mechanism presented by user traffic received by each candidate method is different, Goswami et al. [12]. However, it makes the strategy much we need to direct substantial amount of traffic to each method more general by allowing us to use any LTR rank function to make observations comparable and conclusions statisti- going beyond the limitation of linear LTR models. In this, cally significant. Deployment of a large number of experi- we use the following function: mental and likely sub-optimal ranking functions, especially y a = fa (X) = a f (X) when evaluating baselines, can result in significant business losses for e-Commerce search engines. More importantly, Here, the a is a function of UCB as follows: such an approach of using live user traffic would not allow ucbj us to have a fair comparison of any future algorithm with a = the ones that we test today, so it does not facilitate further Uj study of the problem. This algorithm has been discussed in a paper by Gabillon Perhaps a good and feasible strategy is to design a simulation- et al. [9] and is proven to be also a monotonic sub-modular based counterfactual evaluation method [18]. Here, unlike a function. Hence, a simple greedy algorithm can give a nice traditional static test collection, we also have to introduce approximation also for this. We however do not know how models to simulate user behavior and model the biases ob- this strategy can be compared in terms of losses and the served in real system. This can be a topic itself and we omit discovery metric with respect to the other approaches that further discussion on this as it is out of scope for this short we have discussed in this paper. paper. 5.4 Regret of the eLTR strategies The RSE function has linear regret since it is random. The 7. EXPERIMENTAL RESULTS regret for this can be arbitrarily bad and can thus be given In this section, we first construct a synthetic historical by O(Rl ). The regret for the UCBE algorithm can be given dataset with queries, items and their prices. We also gener- by O(log (Rl )). Both eLTR and AeLTR have similar regret ate the true purchase probabilities and utility scores for the because they both uses monotonic sub-modularity property. item and query pairs. Additionally, we use a specific rank These algorithms pull items using a score that is a func- function to simulate the behavior of a trained LTR model. tion of the LTR score and are expected to provide a better Then we conduct a simulation as described in section ?? loss compared to the approaches that are based on multi- with various exploration strategies. During the simulation armed bandit algorithms. However, it is not immediately we use the observed purchase probabilities estimated from clear how these algorithms can optimize the discoverability the purchase feedback as the most important feature for the objective. The regret of the eLTR group of algorithms can rank function but we use the true probabilities generated |E| during the initial data generation phase to simulate the user be estimated by O(1+1/e− x ) times worst compared to the behavior. optimal which comes from their monotonic sub-modularity The main goal of this experimental study is to evaluate property. The eLTR algorithms can be considered to be sim- the behavior of the exploration strategies with different dis- ilar to the -greedy approaches where the regret bound is not tributions of the utility scores representing different state of better than the UCB algorithm. However, in our simulation, the inventory in an e-com company. we want to study the discoverability metrics for such eLTR We evaluate our algorithms by running the simulation for algorithms. T times. We compute RPV and β-discoverability at the end of T iterations. We also compute a purchase based mean re- 6. EVALUATION METHODOLOGY ciprocal ranking [6] metric (MRR). This metric is computed Due to the involvement of multiple objectives, the eval- by summing the reciprocal ranks of all the items that are uation of E-Com search algorithms also presents new chal- purchased in various user visits for all queries. Moreover, we also discretize our gold utility score between 1 to 5 and Algorithms RPV NDCG MRR β−d generate a rating for each item. This also allows us to com- LTR 0.12 0.96 0.54 0.10 pute a mean NDCG score at k-th position for all the search RSE 0.06 0.76 0.21 0.78 sessions as a relevance metric. UCBE 0.08 0.87 0.28 0.66 We expect to see that the RPV and NDCG of the LTR eLTRb 0.09 0.94 0.33 0.67 function will be the best. however the β-discoverability val- eLTRu 0.09 0.94 0.33 0.67 ues will be better in ranking policies that use an exploration eLTRur 0.09 0.94 0.33 0.67 strategy. The new ranking strategies will incur loss in RPV AeLTR 0.10 0.95 0.40 0.51 and NDCG and based on our theoretical analysis we ex- pect the eLTR methods to have less loss compared to the RSE and UCB based approaches in those measures. We Table 1: Simulation of eLTR framework, with |Q| = 100, |Z| = also expect to see a loss in MRR for all exploration meth- 5000, |L| = 200, β − d = 10%, β = 50, K = 6, x = 3, T = 50000. ods. However, we mainly interested in observing how these algorithms perform in β-discoverability metric compared to LTR. through the search results one after another from the top and can purchase an item based on that item’s purchase probability for that query. Note, in order to keep the simu- 7.1 Synthetic data generation lation simple, we consider an user only purchases one item We first generate a set of N queries and M items. We in one visit and leaves the site after that. We also can ap- then assign the prices of the items by randomly drawing a ply a discount to the probability of purchase logarithmically number between a minimum and a maximum price from a 1 for each lower rank by multiplying log (r+1) , where r is the 2 multi-modal Gaussian distribution that can have up to 1 to ranking position of the item. This is to create an effect of 10 peaks for a query. We select the specific number of peaks the position bias [7]. for a query uniform randomly. We also assign a subset of the peak prices to a query to be considered as the set of it’s 7.2 Description of the experimental study preferred prices. This makes a situation where every query We conduct two sets of experiments with this simulated may have a few preferred price peak points where it may data. also have the sales or revenue peaks. In the first set of experiments we use a Gaussian distri- Now that we have the items and queries defined, we ran- bution with mean 0.5 and the variance 0.1 for generating domly generate an utility score, denoted by (uij ) for every the utility scores, but everything else is same as the previ- item ζj for a query qi . In our set up, we use uniform ran- ous experiment. This means that the utility of the items dom, Gaussian and a long tailed distribution for selecting in the inventory follows a normal distribution. We generate the utilities. These three different distributions represent the scores for all the products from a Gaussian distribution three scenarios for a typical e-com company’s inventory. Ad- with mean 0.5 and the variance 0.1. The table 1 shows the ditionally, we generate a purchase probability between 0.02 tables with final metrics for all the algorithms. We observe to 0.40 for every item for every query. We generate these that with a Gaussian distribution of utility scores the eLTR probabilities such that they correlates with the utility score. approaches have better MRR, and β-discoverability. We ob- We generate these numbers in a way so that these are cor- serve that the AeLTR approach works well by producing related with the utility scores with a statistically significant good MRR and yet the β-discoverability is better than LTR. (p-value less than 0.10) Pearson correlation coefficient [24]. This is a powerful result since it shows that it is possible to We also intend to correlate the purchase probability with integrate this with more sophisticated LTR algorithms the preferred peak prices for a query. Hence, we give an In the second set of experiment, we use a power law to additive boost between 0 to 0.1 to the purchase probability generate the utility distribution. This means that only a in proportion to the absolute difference of the price of the small set of items here can be considered valuable in this item from the closest preferred mean price for that query. scenario. The table 2 shows the final metrics for this case. By generating the purchase probabilities in this way, we en- We notice that even with this distribution of utility scores sure that the actual purchase probabilities are related to the the eLTR variants have smaller loss in RPV, NDCG, and preferred prices for the queries, and also it is related to the in MRR. Note that in this distribution, the discoverability utility scores of the items for a given query. Now, we de- can be considered to be naturally not so useful since a large fine a β-discoverability rate β = b and selects b × M items number of items are not that valuable. We expect in such randomly from the set of all items. In our simulation, we situation, a nice discoverability algorithm can help to elimi- assume that the estimated (observed) purchase probability nate items that do not get sold after sufficient exposure and for all the items in set E at the beginning can be zero. The enable the e-com company to optimize it’s inventory. We rest of the items purchase probability are assumed to be observe a very similar trend for AeLTR algorithm as our estimated correctly at the beginning. Now, we create a sim- previous experiment. It gives better RPV and MRR close ple rank function that is a weighted linear function of the to the LTR algorithm but it has less β-discoverability com- utility score (u), the observed purchase probability (po ), and pared to eLTR approaches. the normalized absolute difference between the product price and the closest preferred mean price (m̂p for the query such that l = 0.60po + 0.20u + 0.20m̂p . Here l denotes the score 8. CONCLUSIONS AND FUTURE WORK of the ranker. This ranker simulates a trained LTR function This paper represents a first step toward formalizing the in e-com search where usually the sales can be considered emerging new E-Com search problem as an optimization the most valuable behavioral signal. problem with multiple objectives including the sales(CRV), We now construct an user model. Here, an user browses revenue (RPV), and discoverability besides relevance. We Algorithms RPV NDCG MRR β−d [10] A. Ghose and S. Yang. An empirical analysis of search LTR 9.56 0.57 0.45 0.11 engine advertising: Sponsored search in electronic RSE 7.55 0.27 0.25 0.76 markets. Management Science, 55(10):1605–1622, UCBE 7.55 0.27 0.26 0.66 2009. eLTRb 8.2 0.33 0.31 0.67 [11] J. Gittins, K. Glazebrook, and R. Weber. Multi-armed eLTRu 8.3 0.33 0.31 0.67 bandit allocation indices. John Wiley & Sons, 2011. eLTRur 8.4 0.33 0.32 0.67 [12] A. Goswami, C. Zhai, and P. Mohapatra. Learning to AeLTR 9.4 0.41 0.40 0.40 rank and discover for e-commerce search. In International Conference on Machine Learning and Data Mining in Pattern Recognition. Table 2: Simulation of eLTR framework, with |Q| = 100, |Z| = [13] L. Hang. A short introduction to learning to rank. 5000, |L| = 200, β − d = 10%, β = 50, K = 6, x = 3, T = 50000. IEICE TRANSACTIONS on Information and Systems, 94(10):1854–1862, 2011. formally define these objectives and discuss multiple strate- [14] K. Järvelin and J. Kekäläinen. Ir evaluation methods gies for conducting exploration with the learning to rank for retrieving highly relevant documents. In algorithms. We hope that our work will open up many new Proceedings of the 23rd annual international ACM directions in research for optimizing e-com search. The obvi- SIGIR conference on Research and development in ous next step is to empirically validate the strategies based information retrieval, pages 41–48. ACM, 2000. on log data from an e-com search engine. The theoreti- [15] T. Joachims. Optimizing search engines using cal framework also enables many interesting ways to further clickthrough data. In Proceedings of the eighth ACM formalize the e-com search problem and develop new effec- SIGKDD international conference on Knowledge tive e-com search algorithms. Finally, the proposed discov- discovery and data mining, pages 133–142. ACM, ery framework is just a small step toward solving the new 2002. problem of optimizing discoverability in e-com search; it is [16] S. K. Karmaker Santu, P. Sondhi, and C. Zhai. On important to further develop more effective algorithms by application of learning to rank for e-commerce search. using these algorithms as baselines. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pages 475–484, New 9. REFERENCES York, NY, USA, 2017. ACM. [1] S. Athey and D. Nekipelov. A structural model of [17] H. Li. A short introduction to learning to rank. IEICE sponsored search advertising auctions. In Sixth ad TRANSACTIONS on Information and Systems, auctions workshop, volume 15, 2010. 94(10):1854–1862, 2011. [2] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time [18] L. Li, S. Chen, J. Kleban, and A. Gupta. analysis of the multiarmed bandit problem. Machine Counterfactual estimation and optimization of click learning, 47(2-3):235–256, 2002. metrics in search engines: A case study. In Proceedings [3] C. Burges, T. Shaked, E. Renshaw, A. Lazier, of the 24th International Conference on World Wide M. Deeds, N. Hamilton, and G. Hullender. Learning to Web, pages 929–934. ACM, 2015. rank using gradient descent. In Proceedings of the [19] T.-Y. Liu. Learning to rank for information retrieval. 22nd international conference on Machine learning, Foundations and Trends in Information Retrieval, pages 89–96. ACM, 2005. 3(3):225–331, 2009. [4] C. J. Burges. From ranknet to lambdarank to [20] A. Mahajan and D. Teneketzis. Multi-armed bandit lambdamart: An overview. Learning, 11(23-581):81, problems. In Foundations and Applications of Sensor 2010. Management, pages 121–151. Springer, 2008. [5] C. J. Burges, R. Ragno, and Q. V. Le. Learning to [21] D. Sorokina and E. Cantu-Paz. Amazon search: The rank with nonsmooth cost functions. In Advances in joy of ranking products. In Proceedings of the 39th neural information processing systems, pages 193–200, International ACM SIGIR Conference on Research 2007. and Development in Information Retrieval, SIGIR ’16, [6] N. Craswell. Mean reciprocal rank. In Encyclopedia of pages 459–460, New York, NY, USA, 2016. ACM. Database Systems, pages 1703–1703. Springer, 2009. [22] K. M. Svore, M. N. Volkovs, and C. J. Burges. [7] N. Craswell, O. Zoeter, M. Taylor, and B. Ramsey. An Learning to rank with multiple objective functions. In experimental comparison of click position-bias models. Proceedings of the 20th international conference on In Proceedings of the 2008 International Conference World wide web, pages 367–376. ACM, 2011. on Web Search and Data Mining, WSDM ’08, pages [23] J. Vermorel and M. Mohri. Multi-armed bandit 87–94. ACM, 2008. algorithms and empirical evaluation. In European [8] J. H. Friedman. Greedy function approximation: a conference on machine learning, pages 437–448. gradient boosting machine. Annals of statistics, pages Springer, 2005. 1189–1232, 2001. [24] R. R. Wilcox. Introduction to robust estimation and [9] V. Gabillon, B. Kveton, Z. Wen, B. Eriksson, and hypothesis testing. Academic press, 2011. S. Muthukrishnan. Adaptive submodular [25] Y. Yue, J. Broder, R. Kleinberg, and T. Joachims. maximization in bandit setting. In Advances in Neural The k-armed dueling bandits problem. Journal of Information Processing Systems, pages 2697–2705, Computer and System Sciences, 78(5):1538–1556, 2013. 2012.