Personalizing Item Recommendation via Price Understanding Soumya Wadhwa Ashish Ranjan Selene Xu soumya.wadhwa@walmartlabs.com ashish.ranjan@walmartlabs.com yue.xu@walmartlabs.com Walmart Labs, Sunnyvale, California Walmart Labs, Sunnyvale, California Walmart Labs, Sunnyvale, California Jason H.D. Cho Sushant Kumar Kannan Achan hcho@walmartlabs.com skumar4@walmartlabs.com kachan@walmartlabs.com Walmart Labs, Sunnyvale, California Walmart Labs, Sunnyvale, California Walmart Labs, Sunnyvale, California ABSTRACT personalize their shopping journey, understanding that the first Personalization has gained a lot of traction in the e-commerce do- customer is looking for higher priced headphones and the second main since there is ample evidence for short-term and long-term customer is looking for lower priced ones will help recommend benefits of understanding user preferences and ensuring user satis- products they are looking for. faction. However, effectively personalizing recommendations is a However, defining what constitutes a high-priced or low-priced challenging task, especially at scale. Price is often a key considera- item is a difficult task. In the e-commerce domain, products, of tion for purchases, and user behavior varies widely depending on course, have price associated with them. But, we do not know demographic and psychological factors. While difficult to model, whether a given price is considered expensive or inexpensive for this is an important signal to consider for user-item recommenda- a given type of product, for example, a light bulb and a laptop. tion. In this paper, we focus on personalizing and improving the $100 may be a bargain for the laptop, but the same price-tag for relevance of item recommendations for e-commerce users by lever- the light bulb might make it very expensive. Similarly, we need aging price as an essential input. More concretely, we segregate to understand item prices for each product type and categorize items into price bands indicating how expensive they are, infer user them into different price bands (e.g. low vs. high) based on this. affinity to price bands based on historical behavior and use fea- Subsequently, we can start understanding which price bands users tures derived from this knowledge to re-rank items in a real-world are likely to purchase from for different product types. recommendation scenario. We experiment with various statistical To summarize, using price to personalize item recommendation and machine learning methods to determine item price bands, user is challenging because user price preferences need to be implicitly price affinities and item price similarities, and demonstrate impact inferred and vary based on the type of product. Additionally, item on the recommendation quality for millions of users and items. prices are not sufficient to determine if a product is considered expensive versus not, and need to be standardized such that they can be compared across different types of products. In this paper, 1 INTRODUCTION we aim to model user price affinity and item price similarity, and Recommender systems are ubiquitous on websites today. Recom- utilize them as input signals along with item-item relevance scores mendation algorithms can be based on item-item interactions or to personalize and improve the quality of item recommendations user-item feedback. In recent times, websites are increasingly fo- for e-commerce users. We achieve this using the following: cusing on providing an experience tailored to their users [35] [10] β€’ Unsupervised methods to divide items into price bands indi- [36] with personalization at the segment or individual level. Un- cating their degree of expensiveness derstanding user preferences and recommending relevant items to β€’ Supervised methods to compute user affinity to different them accordingly has been shown to improve user satisfaction and price bands based on their historical interactions conversion rates, which is a win-win situation [4] [5]. While it is β€’ Item and user price-related features to re-rank items in an essential, scaling the personalization of recommendations anchored actual user-item recommendation setting on combinations of users and items is very challenging, especially This is done at the Product Type (PT) level which is the most granular in the e-commerce domain where millions of users can potentially level of the product taxonomy available in the Walmart product interact with millions of items. catalog. We use a large e-commerce dataset, and experiment with For most users, price is often a key factor for making purchases multiple statistical and machine learning methods to determine [14] [8] [34]. Users make price-value trade-offs when they purchase item price bands, user price affinities and item price similarities. We products, and their behavior can vary widely depending on demo- quantitatively show the positive impact on recommendation quality graphic factors such as their salary or location and psychological upon including price-related features in the re-ranking algorithm. factors such as money consciousness or additional interest in cer- tain types of products. Let us consider two users. The first user is a 2 RELATED WORK sound engineer and is looking to purchase high-quality headphones There has been extensive research on recommender systems and for use at work. Since this user needs to discern any imperfections, personalization. Many research efforts have been focused on collab- they may be looking to purchase expensive headphones. Another orative filtering-based techniques. Traditional matrix factorization user may decide to purchase headphones to listen to podcasts. As (MF) models [24] and variants [17] [31], incorporating implicit feed- long as this user can understand the podcast, sound quality is not back, temporal effects and confidence levels, have proved superior an issue and they can buy lower-priced headphones. To effectively to the classic nearest neighbor approaches in recommending items Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Wadhwa, et al. [33]. Factorization machines [30] have also been used for recom- price band, and eventually incorporate this price understanding into mendation to overcome feature sparsity issues. Emerging deep recommendations. To achieve this, items are clustered into price learning based solutions [6] [15] [13] have also shown promising bands at the product type level (Section 3.1), and then user activity results for recommendation. Item embeddings can be used to com- patterns are learnt with respect to these item price bands to predict pute item-item similarity and recommend items accordingly [37] the probability that the user will purchase an item from a particular [11]. For modeling recommendations based on short session-based price band versus others for that product type. These predicted data, a sequential Recurrent Neural Network [28] (RNN)-based ap- user-item price band affinity scores (Section 3.2) and item-item proach can be used to predict the next item [16]. More recently, price band similarity scores (Section 3.3) are used as features along causal embeddings for recommendation [3] have shown significant with relevance scores to re-rank item-anchored recommendations improvements over state-of-the-art factorization methods. (Section 3.4) for personalization using price understanding. Price is an important factor to consider for users while making an online purchase. In [34], a conceptual framework is developed 3.1 Item Price Bands to explain the effects of the online medium on customer price Item prices vary a lot, from less than 10 dollars for a USB drive to sensitivity. User price sensitivity and price thresholds are discussed thousands of dollars for a QLED television. However, to decide if in [14]. Traditional and online supermarkets are compared in terms an item is expensive or not, just the absolute value of price is insuf- of user behavior with respect to brand, price and other search ficient. It is also important to take into consideration the product attributes in [8], and price sensitivity is found to be higher online. type since a price of $100 might be low for televisions, but high for There are also other studies about impact of advertising [20] and a USB drive. Thus, we need to create representations for prices of brand credibility [9] on price sensitivity. In [38], the potential effect items for each product type such that they are directly comparable of the consumption occasion (functional vs. hedonic), social context across different items. So, we assign each item to one out of 𝑛 bands and household income on users’ price sensitivity is analyzed. There using unsupervised methods since labels are unavailable. is substantial additional literature on consumer price sensitivity. However, price has received relatively less attention as an input 3.1.1 Statistical Methods. We first explore statistical methods us- signal for recommendation. Price is used as a feature in [1] for ing item prices for each product type. personalization in the e-commerce domain by taking the ratio of β€’ Range-Based: For example, say television prices vary from the price of the current item to the average price of previously $100 to $5100, and we decide to create 3 price bands with clicked items. There is also a brief mention of how price can affect price range ratios 3:5:2. Then one unit of the range becomes user’s affinity towards an item in [17]. Authors in [18] perform ($5100 - $100)/(3+5+2) = $500, and the lowest price band data analysis of logs to investigate what makes recommendations extends from $100 to $100 + 3*$500 (= $1600), the middle effective in practice, and include some factors based on β€œprice levels” from $1600 to $2600 + 5*$500 (= $4100) and the highest from per product category. However, methods for determining these price $4100 to $4100 + 2*$500 (= $5100). levels are not discussed and user price affinity is incorporated as β€’ Percentile-Based: For example, say, in the above situation, we an average of recent price levels (not per category). Their focus is decide to create 3 price bands with percentiles 30%, 50% and on the impact of popularity, discounts, reminders and recency on 20%. If there are 200 televisions on our item catalog with 60 user click behavior. In [12], Willingness To Pay (WTP) distributions TVs having price less than $500, 100 TVs with prices between per user and product are modeled, and this is used with discount $500 - $2500 and 40 TVs with prices greater than $2500, then indication and seller reputation in a context-aware recommendation those delineate the price bands. model to improve recommendation quality. More recently, [40] Splitting items into equal bins based on range or percentiles did model the transitive relationship between user-to-item and item- not work well in practice due to skew in item price distributions to-price using Graph Convolution Networks [22] (GCN) to make and transaction volumes. Creating unequal bins needs extensive the learned user representations price-aware. They incorporate manual tuning. prices and categories as nodes along with users and items in a 3.1.2 Clustering. Next, we use some common clustering methods heterogeneous graph. They also consider price as a categorical to automatically put items from each product type into 𝑛 clusters variable and discretize the price value into separate levels based on using the item price values. price ranges but do not experiment with different methods for this. β€’ K-Means [26]: Each item is assigned to the cluster for which In our work, we explore several methods to compute item price the mean price value is closest to the item price. bands and explicitly model user price affinity for various types of β€’ Gaussian Mixture Model (GMM) [32]: We assume that all the products, such that these input signals can be leveraged generally price values are generated from a mixture of a finite number for use cases such as recommendation and search. We demonstrate of normal distributions with unknown means and variances results for user-item and item-item price-related features obtained (estimated using Expectation Maximization). We pick the from different model variations used together with an item-item highest probability cluster as the item’s price band. relevance score for personalizing item-anchored recommendations. 3.1.3 Transaction Balancing. Another method is based on com- puting cumulative transaction volumes after arranging items in 3 METHODOLOGY increasing order of their price value, and determining price band Our goal is to use past item interaction data (such as clicks and add boundaries such that each price band accounts for an equal volume to carts) for a given user and predict their affinity for a particular of item transactions. This technique was devised to mitigate data Personalizing Item Recommendation via Price Understanding imbalance in the subsequent step of using price bands to compute on a number of smaller samples from the data. The final output is price affinities based on user activity. the average of different tree outputs. 3.2 User Price Affinity 3.3 Item Price Similarity After assigning price bands to items at the product type level, the Another input is the similarity between price bands for items across next step is to determine the user price affinity per product type. product types based on user transaction patterns. For example, In other words, given a product type, we want to predict the prob- users who purchase medium-priced televisions might be likely to ability that the user will purchase an item from a certain price purchase high-priced sound bars and these (𝑝𝑑, 𝑝𝑏) pairs are similar. band versus others. For example, say we are considering two price This is also used as a feature while re-ranking to capture the item- bands, β€œexpensive” and β€œcheap”. Sam might have affinities 0.8 and item price similarity. 0.2 towards expensive and cheap fitness trackers, but 0.3 and 0.7 Pearson Correlation [2]: We compute the Pearson correlation (𝜌) towards expensive and cheap bed frames. This indicates that she between observed transactions for each user for different (product likes buying expensive fitness trackers but inexpensive bed frames. type, price band) pairs (say π‘π‘‘π‘Ž , 𝑝𝑏𝑖 and 𝑝𝑑𝑏 , 𝑝𝑏 𝑗 ), and these are used To predict this, we use historical data to train various machine as the price similarity scores. learning models, using 6 months’ data for generating features and cov(𝑋, π‘Œ ) the next 1 month for labels. The baseline prediction is based on the price2price = πœŒπ‘‹ ,π‘Œ = (3) πœŽπ‘‹ βˆ— πœŽπ‘Œ transactions in 6 months. Matrix Factorization [24]: We learn latent representations for (prod- 3.2.1 Baseline. For each user, per product type, we take the number uct type, price band) pairs by creating a user-(product type, price of transactions (trx) in each price band (𝑝𝑏𝑖 ) and normalize it by band) transaction matrix and factorizing it. π‘ˆ π‘šΓ—π‘‘ denotes the user summing transactions across all price bands for that product type representation (π‘š users) and 𝑉 𝑛×𝑑 denotes the (product type, price (𝑝𝑑 𝑗 ) and user to obtain affinity scores. band) representation (𝑛 pairs). Embeddings are learned such that π‘ˆπ‘‰ 𝑇 is a good approximation of transaction matrix 𝑇 . Cosine sim- # of trx in 𝑝𝑏𝑖 , 𝑝𝑑 𝑗 ilarity between these low-dimensional vectors are used as price user_price_affinity (𝑝𝑏𝑖 , 𝑝𝑑 𝑗 ) = (1) similarity scores. # of trx in 𝑝𝑑 𝑗 𝑒 ·𝑣 3.2.2 Machine Learning. We consider user price affinity prediction price2price = π‘π‘œπ‘  (πœƒ ) = (4) ||𝑒 || ||𝑣 || as a multi-class classification (supervised machine learning) prob- lem with number of classes equal to the number of price bands (𝑛). 3.4 Re-Ranking For each user and product type, we use features such as number To tie everything up, we have a re-ranker engine that is capable of of transactions, add to carts and views of items per price band per incorporating user price understanding and item price similarity month (for 6 months) by that user for that product type. We did not into any item recommendation set such as Viewed also Viewed have ground truth data for labels. As a proxy, we use the price band and Bought also Bought. We use an inference function to combine which has the maximum number of transactions by the user for features related to user preference and item relevance, and predict that product type, as the label. The aggregation of data for labels user-item interactions: is done using the month following the last month used for feature generation. Thus, each data point is used to predict price affinities 𝑃 (𝑒 interacts with π‘Ÿ | 𝑒 just interacted with 𝑖) = 𝑓 (𝑔(𝑒, π‘Ÿ ), β„Ž(𝑖, π‘Ÿ )) of a specific user towards different price bands in a particular prod- (5) uct type. Subsequently, we use these features and labels to train where 𝑒 is the user, 𝑖 is the anchor item, π‘Ÿ is the recommended item, and test multi-class Logistic Regression (LR) [23] and Decision Tree 𝑔(𝑒, π‘Ÿ ) represents 𝑒’s preference for π‘Ÿ , and β„Ž(𝑖, π‘Ÿ ) represents item (DT) [25] models. relevance between 𝑖 and π‘Ÿ . Currently, the inference function we In LR, for each input data point (feature vector π‘₯ and label 𝑦), the use is simple logistic regression where user preference score and model learns weights (weight vector 𝑀) and outputs a probability relevance score are combined linearly. The weights can be learned distribution over the price bands (𝑝𝑏) for a given user and product either at the global level (i.e. same weights across all product types) type (𝑝𝑑) which represents their affinity. or at the product type level. There are more details in Section 4.4. exp 𝑀𝑖𝑇 π‘₯ 4 EXPERIMENTS AND RESULTS user_price_affinity (𝑝𝑏𝑖 , 𝑝𝑑 𝑗 ) = Í𝑛 (2) exp π‘€π‘˜π‘‡ π‘₯ We use a real-world proprietary e-commerce dataset from wal- π‘˜=1 mart.com for demonstrating results. We determine price bands for We use two variants - unweighted (LR-unbal) which is the vanilla few million items and predict price affinity scores for millions of model and weighted (LR-bal) based on class imbalance, where users across around 6000 product types. each data point is assigned a weight while contributing to the loss/gradient computation. The weight balancing heuristic used 4.1 Item Price Bands n_samples [21] is inversely proportional to class frequencies: n_classes βˆ— count_y , We explored the trade-off between granularity for more useful user where 𝑦 is the class label. affinity scores and data sparsity issues in user-price band interac- In DT, data is continuously split based on a certain feature at each tions as the number of price bands per product type increases, and step. We also consider random forests and fit multiple decision trees decided to use 𝑛 = 5 item price bands for our experiments. We Wadhwa, et al. evaluate the discovered item price bands qualitatively, since ground on item-item features such as number of co-views, title match and truth labels are unavailable. One way is to look at how price ranges popularity. The price understanding model offers two additional for different products were being split based on different methods features for re-ranking on top of the relevance score: user price described in Section 3.1. Of these, (as shown in Figure 1) we pick affinity score and price2price similarity score. k-means and transaction balancing (transac-bal) as methods to fur- We first test out various methods used to develop user price ther evaluate in the subsequent steps of predicting user affinities affinity. We start with two main methods for item price banding: and using these to re-rank recommendations. We observe that clus- k-means and transaction balancing. For each item price banding tering methods such as k-means put fewer very high priced items model we have four variations for user price affinity: baseline, into the higher price bands, whereas trying to equalize number of logistic regression (unbalanced), logistic regression (balanced) and transactions puts fewer items into the lower price bands, which is decision trees. This gives us a total of eight versions of user price expected since less expensive items usually have more transactions. affinity scores. The inference function for re-ranking is balanced We also randomly sample items and inspect the quality of price logistic regression: bands. An example of televisions from 5 price bands (v low-0, low-1, medium-2, high-3 and v high-4) is shown in Figure 2. 𝑦 = 𝑀 0 + 𝑀 1 Γ— relevance + 𝑀 2 Γ— user_price_affinity (6) The weights in the equation above are trained at the global level 4.2 User Price Affinity (as opposed to at each product-type level) and are optimized for We hold out 20% of data to test the trained user price affinity mod- items that are co-viewed within each user session. To evaluate per- els described in Section 3.2. We use precision, recall and F1 score, formance, we use common ranking evaluation metrics Normalized which are common multi-class classification evaluation metrics, to Discounted Cumulative Gain (NDCG) [19], Mean Hit Rate (MHR), assess the performance of different models. Since the classes in the Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) data are not balanced, accuracy is not a good metric. Additionally, [27] [7]. The offline evaluation results are shown in Table 4. We we also use the Mean Reciprocal Rank (MRR) to check whether limit the evaluation metrics to be based on the top 5 recommen- even if the max transaction price band (ground truth label) does dations. We observe that all the models outperform the relevance not get the maximum price affinity score, it gets a reasonably low only model (no re-ranking). The best performing model uses trans- (better) rank. Results for different models when price bands are action balancing for price banding and applies weighted logistic determined using k-means and transac-bal are shown in Table 1 and regression (balanced) to derive user price affinity scores. Table 2 respectively. Random forests did not give much improve- We now expand on the previously established best performing ment over simple decision trees, so we have omitted those results. model and supplement it with item price similarity information For the baseline, we obtain an overall MRR of around 0.51 for k- between the anchor item and the recommended item. We have two means and around 0.37 for transac-bal. All the machine learning variations of item similarity scores to compare: Pearson correla- methods performed better than the baseline. For logistic regression, tion and matrix factorization. The inference function now has an we explored hyperparameters aggregation depth: [2,4], maximum additional feature, as follows: iterations: [100,1000], regularization: [0,0.01] and elastic net weights: 𝑦 = 𝑀 0 + 𝑀 1 Γ— relevance [0.4,0.8], and were able to obtain an overall MRR of around 0.85 for k-means and around 0.79 for transac-bal. For decision trees, we + 𝑀 2 Γ— user_price_affinity (7) explored hyperparameters impurity: ["entropy", "gini"], maximum + 𝑀 3 Γ— price2price_similarity depth: [10, 20, 30] and maximum bins: [16, 32, 64], and were able to obtain an overall MRR of around 0.85 for k-means and around 0.76 Again we adopt a balanced logistic regression model to train the for transac-bal. We observe that weighted / class-balanced logistic above objective function and learn the weights at the global level. regression performs the best for both item price banding strategies, The results are shown in Table 5. We observe an even greater but performance varies across different price bands as seen in Fig- boost in performance by adding the price2price feature. Overall, the ure 3, with metrics falling for higher price bands in the k-means best performing model uses price2price scores derived from matrix case and remaining at similar levels in the transac-bal case. factorization. Compared to the relevance only model, this method shows 0.64% improvement in NDCG, 1% improvement in MHR, and 0.93% improvement in MAP. The improvements in NDCG, MHR 4.3 Item Price Similarity and MAP@5 are statistically significant at 5% level in our offline Figure 4 shows an example of results obtained for product type, evaluation. Though the MRR is slightly lower, the difference is price band pairs which are most similar to Medium-2 priced Bed not statistically significant. Also, since 5-6 recommended items are Sheets. We leave quantitative evaluation of methods to the down- typically shown on the first pane of the module, metrics such as stream re-ranking application. MHR become more important. We further study the weights, 𝑀 1 𝑀 2 𝑀 3 from the inference 4.4 Re-Ranking function to gauge feature importance. After adjusting for feature We show how price understanding models perform when imple- variance (standard scaling of features), the ratio among the weights mented on the "customers who viewed also viewed" (VAV) appli- 𝑀 1 : 𝑀 2 : 𝑀 3 = 33:3:1. This tells us that the relevance score from cation (example shown in Figure 5). We take few million anchor the VAV model contributes the most even during re-ranking, but items from VAV and limit to 𝑁 <= 30 recommendations for each price-related features also add value. The user price affinity feature item ranked by a "relevance" score. This relevance score is based has greater weight than the price2price feature. Personalizing Item Recommendation via Price Understanding Figure 1: Different price bands obtained for k-means and transac-bal for product type Televisions Figure 2: An example of items in different price bands obtained for product type Televisions Method Pr Band Prec Rec F1 MRR Method Pr Band Prec Rec F1 MRR V Low 0.51 0.56 0.53 0.77 V Low 0.06 0.07 0.06 0.48 Low 0.36 0.33 0.34 0.64 Low 0.11 0.12 0.11 0.41 Baseline Medium 0.11 0.09 0.10 0.40 Baseline Medium 0.16 0.17 0.17 0.40 High 0.02 0.02 0.02 0.26 High 0.22 0.22 0.22 0.42 V High 0.002 0.001 0.001 0.20 V High 0.46 0.43 0.44 0.58 V Low 0.80 0.87 0.83 0.92 V Low 0.59 0.40 0.48 0.54 Low 0.72 0.71 0.71 0.85 Low 0.67 0.32 0.44 0.53 LR-unbal Medium 0.67 0.52 0.59 0.71 LR-unbal Medium 0.63 0.40 0.49 0.65 High 0.57 0.21 0.31 0.46 High 0.63 0.52 0.57 0.74 V High 0.87 0.02 0.05 0.24 V High 0.65 0.89 0.75 0.93 V Low 0.84 0.83 0.83 0.88 V Low 0.51 0.55 0.53 0.69 Low 0.73 0.70 0.71 0.83 Low 0.60 0.45 0.52 0.64 LR-bal Medium 0.59 0.63 0.61 0.78 LR-bal Medium 0.58 0.55 0.56 0.73 High 0.46 0.60 0.52 0.74 High 0.59 0.58 0.59 0.75 V High 0.11 0.37 0.18 0.58 V High 0.76 0.81 0.78 0.87 V Low 0.83 0.81 0.82 0.89 V Low 0.59 0.32 0.42 0.50 Low 0.65 0.75 0.69 0.87 Low 0.58 0.39 0.47 0.57 DT Medium 0.61 0.46 0.53 0.68 DT Medium 0.59 0.45 0.51 0.63 High 0.54 0.35 0.43 0.52 High 0.61 0.47 0.53 0.69 V High 0.05 0.01 0.01 0.21 V High 0.65 0.86 0.74 0.91 Table 1: k-means to determine Item Price Bands Table 2: transac-bal to determine Item Price Bands Table 3: Evaluation Metrics for Different Price Affinity Models 5 CONCLUSION their output. This is done by assigning price bands to items of dif- In this paper, we discuss a novel approach to incorporate price- ferent types, using historical user-item data to predict user price related user-item signals into recommender systems to personalize affinity and using this affinity along with an item price band simi- larity score to re-rank item recommendations anchored on a (user, Wadhwa, et al. Figure 3: Comparison of Evaluation Metrics for Balanced Logistic Regression with k-means vs transac-bal item price banding Figure 4: Top 10 similar Product Types, Price Bands to Medium-2 Bed Sheets Figure 5: Viewed also Viewed recommendations for a dog food container item Personalizing Item Recommendation via Price Understanding Item Price Band Method User Price Affinity Method NDCG@5 MHR@5 MRR@5 MAP@5 - Relevance - 0.4375 0.8480 0.5418 0.2048 k-means Baseline 0.4381 0.8493 0.5418 0.2053 k-means LR (unbalanced) 0.4386 0.8504 0.5417 0.2057 k-means LR (balanced) 0.4388 0.8508 0.5417 0.2057 k-means Decision Trees 0.4385 0.8503 0.5416 0.2056 transac-bal Baseline 0.4384 0.8503 0.5415 0.2054 transac-bal LR (unbalanced) 0.4388 0.8519 0.5410 0.2057 transac-bal LR (balanced) 0.4389 0.8524 0.5408 0.2057 transac-bal Decision Trees 0.4386 0.8513 0.5411 0.2056 Table 4: Evaluation Metrics for Different Re-ranking Experiments (using Price Affinity only) Price Similarity Method NDCG@5 MHR@5 MRR@5 MAP@5 - Relevance - 0.4375 0.8480 0.5418 0.2048 Pearson Correlation 0.4402 0.8557 0.5403 0.2067 Matrix Factorization 0.4403 0.8565 0.5400 0.2067 Table 5: Evaluation Metrics for Different Re-ranking Experiments (using Price Affinity (best) with Price Similarity) item) pair. We demonstrate statistically significant improvement in Marketing 19, 1 (2002), 1–19. offline ranking metrics after explicitly including price inputs (user [10] Carlos A Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Transactions on Management price affinity using balanced logistic regression with transaction- Information Systems (TMIS) 6, 4 (2015), 1–19. balanced price bands; item price similarity using matrix factoriza- [11] Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Jaikit Savla, Varun Bhagwan, and Doug Sharp. 2015. E-commerce in your Inbox: tion). To compute price affinities, other user-website interaction Product Recommendations at Scale. In Proceedings of the 21th ACM SIGKDD data such as user’s historical search queries can also be used. In International Conference on Knowledge Discovery and Data Mining. 1809–1818. the future, we plan to learn embeddings which implicitly encode [12] Asnat Greenstein-Messica and Lior Rokach. 2018. Personal Price Aware Multi- Seller Recommender System: Evidence from eBay. Knowledge-Based Systems 150 item price information and user representations such that similarity (2018), 14–26. between user and item embeddings is indicative of price affinity. [13] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. We can also experiment with other pairwise or listwise learning to DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. arXiv preprint arXiv:1703.04247 (2017). rank methods to improve the current pointwise ranking function. [14] Sangman Han, Sunil Gupta, and Donald R Lehmann. 2001. Consumer Price Sensitivity and Price Thresholds. Journal of Retailing 77, 4 (2001), 435–456. REFERENCES [15] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International [1] Grigor Aslanyan, Aritra Mandal, Prathyusha Senthil Kumar, Amit Jaiswal, and Conference on World Wide Web. 173–182. Manojkumar Rangasamy Kannadasan. 2020. Personalized Ranking in eCommerce [16] BalΓ‘zs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Search. In Companion Proceedings of the Web Conference 2020. 96–97. 2015. Session-based Recommendations with Recurrent Neural Networks. arXiv [2] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson preprint arXiv:1511.06939 (2015). Correlation Coefficient. In Noise Reduction in Speech Processing. Springer, 1–4. [17] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for [3] Stephen Bonner and Flavian Vasile. 2018. Causal Embeddings for Recommen- Implicit Feedback datasets. In 2008 Eighth IEEE International Conference on Data dation. In Proceedings of the 12th ACM Conference on Recommender Systems. Mining. IEEE, 263–272. 104–112. [18] Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-Based Item [4] Pei-Yu Chen and Shin-yi Wu. 2007. Does Collaborative Filtering Technology Recommendation in e-Commerce: On Short-Term Intents, Reminders, Trends Impact Sales? Empirical Evidence from Amazon.com. Empirical Evidence from and Discounts. 27, 3–5 (Dec. 2017), 351–392. https://doi.org/10.1007/s11257-017- Amazon.Com (July 8, 2007) (2007). 9194-1 [5] Pei-Yu Chen, Shin-yi Wu, and Jungsun Yoon. 2004. The Impact of Online Rec- [19] Kalervo JΓ€rvelin and Jaana KekΓ€lΓ€inen. 2002. Cumulated Gain-based Evaluation ommendations and Consumer Feedback on Sales. ICIS 2004 Proceedings (2004), of IR Techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 58. 422–446. [6] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, [20] Anil Kaul and Dick R Wittink. 1995. Empirical Generalizations about the Impact Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. of Advertising on Price Sensitivity and Price. Marketing Science 14, 3_supplement 2016. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st (1995), G151–G160. Workshop on Deep Learning for Recommender Systems. 7–10. [21] Gary King and Langche Zeng. 2001. Logistic Regression in Rare Events Data. [7] W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines: Political Analysis 9, 2 (2001), 137–163. Information Retrieval in Practice. Vol. 520. Addison-Wesley Reading. [22] Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with [8] Alexandru M Degeratu, Arvind Rangaswamy, and Jianan Wu. 2000. Consumer Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016). Choice Behavior in Online and Traditional Supermarkets: The Effects of Brand [23] David G Kleinbaum, K Dietz, M Gail, Mitchel Klein, and Mitchell Klein. 2002. Name, Price, and other Search Attributes. International Journal of Research in Logistic Regression. Springer. Marketing 17, 1 (2000), 55–78. [24] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Tech- [9] TΓΌlin Erdem, Joffre Swait, and Jordan Louviere. 2002. The Impact of Brand niques for Recommender Systems. Computer 42, 8 (2009), 30–37. Credibility on Consumer Price Sensitivity. International Journal of Research in Wadhwa, et al. [25] Wei-Yin Loh. 2014. Fifty Years of Classification and Regression Trees. International [34] Venkatesh Shankar, Arvind Rangaswamy, and Michael Pusateri. 1999. The Online Statistical Review 82, 3 (2014), 329–348. Medium and Customer Price Sensitivity. Working Paper (1999). [26] James MacQueen et al. 1967. Some Methods for Classification and Analysis of [35] Brent Smith and Greg Linden. 2017. Two Decades of Recommender Systems at Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Amazon.com. IEEE Internet Computing 21, 3 (2017), 12–18. Mathematical Statistics and Probability, Vol. 1. Oakland, CA, USA, 281–297. [36] Jaime Teevan, Susan T. Dumais, and Eric Horvitz. 2005. Personalizing Search via [27] Christopher D Manning, Hinrich SchΓΌtze, and Prabhakar Raghavan. 2008. Intro- Automated Analysis of Interests and Activities. In Proceedings of the 28th Annual duction to Information Retrieval. Cambridge University Press. International ACM SIGIR Conference on Research and Development in Information [28] Barak A Pearlmutter. 1995. Gradient Calculations for Dynamic Recurrent Neural Retrieval (Salvador, Brazil) (SIGIR ’05). Association for Computing Machinery, Networks: A Survey. IEEE Transactions on Neural networks 6, 5 (1995), 1212–1228. New York, NY, USA, 449–456. https://doi.org/10.1145/1076034.1076111 [29] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. [37] Flavian Vasile, Elena Smirnova, and Alexis Conneau. 2016. Meta-prod2vec: Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- Product Embeddings using side-information for Recommendation. In Proceedings napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine of the 10th ACM Conference on Recommender Systems. 225–232. Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. [38] Kirk L Wakefield and J Jeffrey Inman. 2003. Situational Price Sensitivity: the Role [30] Steffen Rendle. 2010. Factorization Machines. In 2010 IEEE International Confer- of Consumption Occasion, Social Context and Income. Journal of Retailing 79, 4 ence on Data Mining. IEEE, 995–1000. (2003), 199–212. [31] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. [39] Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, 2012. BPR: Bayesian Personalized Ranking from Implicit Feedback. arXiv preprint Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. arXiv:1205.2618 (2012). Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache [32] Douglas A Reynolds. 2009. Gaussian Mixture Models. Encyclopedia of Biometrics Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 741 (2009). 2016), 56–65. https://doi.org/10.1145/2934664 [33] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based [40] Yu Zheng, Chen Gao, Xiangnan He, Yong Li, and Depeng Jin. 2020. Price-Aware Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th Recommendation with Graph Convolutional Networks. In 2020 IEEE 36th Inter- International Conference on World Wide Web. 285–295. national Conference on Data Engineering (ICDE). IEEE, 133–144.