Personalizing Item Recommendation via Price Understanding
               Soumya Wadhwa                                              Ashish Ranjan                                       Selene Xu
    soumya.wadhwa@walmartlabs.com                            ashish.ranjan@walmartlabs.com                         yue.xu@walmartlabs.com
    Walmart Labs, Sunnyvale, California                     Walmart Labs, Sunnyvale, California                Walmart Labs, Sunnyvale, California

                 Jason H.D. Cho                                          Sushant Kumar                                     Kannan Achan
        hcho@walmartlabs.com                                   skumar4@walmartlabs.com                             kachan@walmartlabs.com
    Walmart Labs, Sunnyvale, California                     Walmart Labs, Sunnyvale, California                Walmart Labs, Sunnyvale, California

ABSTRACT                                                                                 personalize their shopping journey, understanding that the first
Personalization has gained a lot of traction in the e-commerce do-                       customer is looking for higher priced headphones and the second
main since there is ample evidence for short-term and long-term                          customer is looking for lower priced ones will help recommend
benefits of understanding user preferences and ensuring user satis-                      products they are looking for.
faction. However, effectively personalizing recommendations is a                            However, defining what constitutes a high-priced or low-priced
challenging task, especially at scale. Price is often a key considera-                   item is a difficult task. In the e-commerce domain, products, of
tion for purchases, and user behavior varies widely depending on                         course, have price associated with them. But, we do not know
demographic and psychological factors. While difficult to model,                         whether a given price is considered expensive or inexpensive for
this is an important signal to consider for user-item recommenda-                        a given type of product, for example, a light bulb and a laptop.
tion. In this paper, we focus on personalizing and improving the                         $100 may be a bargain for the laptop, but the same price-tag for
relevance of item recommendations for e-commerce users by lever-                         the light bulb might make it very expensive. Similarly, we need
aging price as an essential input. More concretely, we segregate                         to understand item prices for each product type and categorize
items into price bands indicating how expensive they are, infer user                     them into different price bands (e.g. low vs. high) based on this.
affinity to price bands based on historical behavior and use fea-                        Subsequently, we can start understanding which price bands users
tures derived from this knowledge to re-rank items in a real-world                       are likely to purchase from for different product types.
recommendation scenario. We experiment with various statistical                             To summarize, using price to personalize item recommendation
and machine learning methods to determine item price bands, user                         is challenging because user price preferences need to be implicitly
price affinities and item price similarities, and demonstrate impact                     inferred and vary based on the type of product. Additionally, item
on the recommendation quality for millions of users and items.                           prices are not sufficient to determine if a product is considered
                                                                                         expensive versus not, and need to be standardized such that they
                                                                                         can be compared across different types of products. In this paper,
1    INTRODUCTION                                                                        we aim to model user price affinity and item price similarity, and
Recommender systems are ubiquitous on websites today. Recom-                             utilize them as input signals along with item-item relevance scores
mendation algorithms can be based on item-item interactions or                           to personalize and improve the quality of item recommendations
user-item feedback. In recent times, websites are increasingly fo-                       for e-commerce users. We achieve this using the following:
cusing on providing an experience tailored to their users [35] [10]                           • Unsupervised methods to divide items into price bands indi-
[36] with personalization at the segment or individual level. Un-                                cating their degree of expensiveness
derstanding user preferences and recommending relevant items to                               • Supervised methods to compute user affinity to different
them accordingly has been shown to improve user satisfaction and                                 price bands based on their historical interactions
conversion rates, which is a win-win situation [4] [5]. While it is                           • Item and user price-related features to re-rank items in an
essential, scaling the personalization of recommendations anchored                               actual user-item recommendation setting
on combinations of users and items is very challenging, especially                       This is done at the Product Type (PT) level which is the most granular
in the e-commerce domain where millions of users can potentially                         level of the product taxonomy available in the Walmart product
interact with millions of items.                                                         catalog. We use a large e-commerce dataset, and experiment with
   For most users, price is often a key factor for making purchases                      multiple statistical and machine learning methods to determine
[14] [8] [34]. Users make price-value trade-offs when they purchase                      item price bands, user price affinities and item price similarities. We
products, and their behavior can vary widely depending on demo-                          quantitatively show the positive impact on recommendation quality
graphic factors such as their salary or location and psychological                       upon including price-related features in the re-ranking algorithm.
factors such as money consciousness or additional interest in cer-
tain types of products. Let us consider two users. The first user is a                   2   RELATED WORK
sound engineer and is looking to purchase high-quality headphones                        There has been extensive research on recommender systems and
for use at work. Since this user needs to discern any imperfections,                     personalization. Many research efforts have been focused on collab-
they may be looking to purchase expensive headphones. Another                            orative filtering-based techniques. Traditional matrix factorization
user may decide to purchase headphones to listen to podcasts. As                         (MF) models [24] and variants [17] [31], incorporating implicit feed-
long as this user can understand the podcast, sound quality is not                       back, temporal effects and confidence levels, have proved superior
an issue and they can buy lower-priced headphones. To effectively                        to the classic nearest neighbor approaches in recommending items
Copyright (c) 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
                                                                                                                                      Wadhwa, et al.


[33]. Factorization machines [30] have also been used for recom-            price band, and eventually incorporate this price understanding into
mendation to overcome feature sparsity issues. Emerging deep                recommendations. To achieve this, items are clustered into price
learning based solutions [6] [15] [13] have also shown promising            bands at the product type level (Section 3.1), and then user activity
results for recommendation. Item embeddings can be used to com-             patterns are learnt with respect to these item price bands to predict
pute item-item similarity and recommend items accordingly [37]              the probability that the user will purchase an item from a particular
[11]. For modeling recommendations based on short session-based             price band versus others for that product type. These predicted
data, a sequential Recurrent Neural Network [28] (RNN)-based ap-            user-item price band affinity scores (Section 3.2) and item-item
proach can be used to predict the next item [16]. More recently,            price band similarity scores (Section 3.3) are used as features along
causal embeddings for recommendation [3] have shown significant             with relevance scores to re-rank item-anchored recommendations
improvements over state-of-the-art factorization methods.                   (Section 3.4) for personalization using price understanding.
    Price is an important factor to consider for users while making
an online purchase. In [34], a conceptual framework is developed            3.1    Item Price Bands
to explain the effects of the online medium on customer price               Item prices vary a lot, from less than 10 dollars for a USB drive to
sensitivity. User price sensitivity and price thresholds are discussed      thousands of dollars for a QLED television. However, to decide if
in [14]. Traditional and online supermarkets are compared in terms          an item is expensive or not, just the absolute value of price is insuf-
of user behavior with respect to brand, price and other search              ficient. It is also important to take into consideration the product
attributes in [8], and price sensitivity is found to be higher online.      type since a price of $100 might be low for televisions, but high for
There are also other studies about impact of advertising [20] and           a USB drive. Thus, we need to create representations for prices of
brand credibility [9] on price sensitivity. In [38], the potential effect   items for each product type such that they are directly comparable
of the consumption occasion (functional vs. hedonic), social context        across different items. So, we assign each item to one out of 𝑛 bands
and household income on users’ price sensitivity is analyzed. There         using unsupervised methods since labels are unavailable.
is substantial additional literature on consumer price sensitivity.
    However, price has received relatively less attention as an input       3.1.1 Statistical Methods. We first explore statistical methods us-
signal for recommendation. Price is used as a feature in [1] for            ing item prices for each product type.
personalization in the e-commerce domain by taking the ratio of                  • Range-Based: For example, say television prices vary from
the price of the current item to the average price of previously                   $100 to $5100, and we decide to create 3 price bands with
clicked items. There is also a brief mention of how price can affect               price range ratios 3:5:2. Then one unit of the range becomes
user’s affinity towards an item in [17]. Authors in [18] perform                   ($5100 - $100)/(3+5+2) = $500, and the lowest price band
data analysis of logs to investigate what makes recommendations                    extends from $100 to $100 + 3*$500 (= $1600), the middle
effective in practice, and include some factors based on “price levels”            from $1600 to $2600 + 5*$500 (= $4100) and the highest from
per product category. However, methods for determining these price                 $4100 to $4100 + 2*$500 (= $5100).
levels are not discussed and user price affinity is incorporated as              • Percentile-Based: For example, say, in the above situation, we
an average of recent price levels (not per category). Their focus is               decide to create 3 price bands with percentiles 30%, 50% and
on the impact of popularity, discounts, reminders and recency on                   20%. If there are 200 televisions on our item catalog with 60
user click behavior. In [12], Willingness To Pay (WTP) distributions               TVs having price less than $500, 100 TVs with prices between
per user and product are modeled, and this is used with discount                   $500 - $2500 and 40 TVs with prices greater than $2500, then
indication and seller reputation in a context-aware recommendation                 those delineate the price bands.
model to improve recommendation quality. More recently, [40]                Splitting items into equal bins based on range or percentiles did
model the transitive relationship between user-to-item and item-            not work well in practice due to skew in item price distributions
to-price using Graph Convolution Networks [22] (GCN) to make                and transaction volumes. Creating unequal bins needs extensive
the learned user representations price-aware. They incorporate              manual tuning.
prices and categories as nodes along with users and items in a              3.1.2 Clustering. Next, we use some common clustering methods
heterogeneous graph. They also consider price as a categorical              to automatically put items from each product type into 𝑛 clusters
variable and discretize the price value into separate levels based on       using the item price values.
price ranges but do not experiment with different methods for this.
                                                                                 • K-Means [26]: Each item is assigned to the cluster for which
    In our work, we explore several methods to compute item price
                                                                                   the mean price value is closest to the item price.
bands and explicitly model user price affinity for various types of
                                                                                 • Gaussian Mixture Model (GMM) [32]: We assume that all the
products, such that these input signals can be leveraged generally
                                                                                   price values are generated from a mixture of a finite number
for use cases such as recommendation and search. We demonstrate
                                                                                   of normal distributions with unknown means and variances
results for user-item and item-item price-related features obtained
                                                                                   (estimated using Expectation Maximization). We pick the
from different model variations used together with an item-item
                                                                                   highest probability cluster as the item’s price band.
relevance score for personalizing item-anchored recommendations.
                                                                            3.1.3 Transaction Balancing. Another method is based on com-
                                                                            puting cumulative transaction volumes after arranging items in
3    METHODOLOGY                                                            increasing order of their price value, and determining price band
Our goal is to use past item interaction data (such as clicks and add       boundaries such that each price band accounts for an equal volume
to carts) for a given user and predict their affinity for a particular      of item transactions. This technique was devised to mitigate data
Personalizing Item Recommendation via Price Understanding


imbalance in the subsequent step of using price bands to compute               on a number of smaller samples from the data. The final output is
price affinities based on user activity.                                       the average of different tree outputs.

3.2     User Price Affinity                                                    3.3    Item Price Similarity
After assigning price bands to items at the product type level, the            Another input is the similarity between price bands for items across
next step is to determine the user price affinity per product type.            product types based on user transaction patterns. For example,
In other words, given a product type, we want to predict the prob-             users who purchase medium-priced televisions might be likely to
ability that the user will purchase an item from a certain price               purchase high-priced sound bars and these (𝑝𝑡, 𝑝𝑏) pairs are similar.
band versus others. For example, say we are considering two price              This is also used as a feature while re-ranking to capture the item-
bands, “expensive” and “cheap”. Sam might have affinities 0.8 and              item price similarity.
0.2 towards expensive and cheap fitness trackers, but 0.3 and 0.7              Pearson Correlation [2]: We compute the Pearson correlation (𝜌)
towards expensive and cheap bed frames. This indicates that she                between observed transactions for each user for different (product
likes buying expensive fitness trackers but inexpensive bed frames.            type, price band) pairs (say 𝑝𝑡𝑎 , 𝑝𝑏𝑖 and 𝑝𝑡𝑏 , 𝑝𝑏 𝑗 ), and these are used
To predict this, we use historical data to train various machine               as the price similarity scores.
learning models, using 6 months’ data for generating features and                                                          cov(𝑋, 𝑌 )
the next 1 month for labels. The baseline prediction is based on the                              price2price = 𝜌𝑋 ,𝑌 =                               (3)
                                                                                                                            𝜎𝑋 ∗ 𝜎𝑌
transactions in 6 months.
                                                                               Matrix Factorization [24]: We learn latent representations for (prod-
3.2.1 Baseline. For each user, per product type, we take the number            uct type, price band) pairs by creating a user-(product type, price
of transactions (trx) in each price band (𝑝𝑏𝑖 ) and normalize it by            band) transaction matrix and factorizing it. 𝑈 𝑚×𝑑 denotes the user
summing transactions across all price bands for that product type              representation (𝑚 users) and 𝑉 𝑛×𝑑 denotes the (product type, price
(𝑝𝑡 𝑗 ) and user to obtain affinity scores.                                    band) representation (𝑛 pairs). Embeddings are learned such that
                                                                               𝑈𝑉 𝑇 is a good approximation of transaction matrix 𝑇 . Cosine sim-
                                                # of trx in 𝑝𝑏𝑖 , 𝑝𝑡 𝑗         ilarity between these low-dimensional vectors are used as price
          user_price_affinity (𝑝𝑏𝑖 , 𝑝𝑡 𝑗 ) =                            (1)   similarity scores.
                                                  # of trx in 𝑝𝑡 𝑗
                                                                                                                             𝑢 ·𝑣
3.2.2 Machine Learning. We consider user price affinity prediction                               price2price = 𝑐𝑜𝑠 (𝜃 ) =                        (4)
                                                                                                                          ||𝑢 || ||𝑣 ||
as a multi-class classification (supervised machine learning) prob-
lem with number of classes equal to the number of price bands (𝑛).             3.4    Re-Ranking
For each user and product type, we use features such as number                 To tie everything up, we have a re-ranker engine that is capable of
of transactions, add to carts and views of items per price band per            incorporating user price understanding and item price similarity
month (for 6 months) by that user for that product type. We did not            into any item recommendation set such as Viewed also Viewed
have ground truth data for labels. As a proxy, we use the price band           and Bought also Bought. We use an inference function to combine
which has the maximum number of transactions by the user for                   features related to user preference and item relevance, and predict
that product type, as the label. The aggregation of data for labels            user-item interactions:
is done using the month following the last month used for feature
generation. Thus, each data point is used to predict price affinities          𝑃 (𝑢 interacts with 𝑟 | 𝑢 just interacted with 𝑖) = 𝑓 (𝑔(𝑢, 𝑟 ), ℎ(𝑖, 𝑟 ))
of a specific user towards different price bands in a particular prod-                                                                                (5)
uct type. Subsequently, we use these features and labels to train              where 𝑢 is the user, 𝑖 is the anchor item, 𝑟 is the recommended item,
and test multi-class Logistic Regression (LR) [23] and Decision Tree           𝑔(𝑢, 𝑟 ) represents 𝑢’s preference for 𝑟 , and ℎ(𝑖, 𝑟 ) represents item
(DT) [25] models.                                                              relevance between 𝑖 and 𝑟 . Currently, the inference function we
In LR, for each input data point (feature vector 𝑥 and label 𝑦), the           use is simple logistic regression where user preference score and
model learns weights (weight vector 𝑤) and outputs a probability               relevance score are combined linearly. The weights can be learned
distribution over the price bands (𝑝𝑏) for a given user and product            either at the global level (i.e. same weights across all product types)
type (𝑝𝑡) which represents their affinity.                                     or at the product type level. There are more details in Section 4.4.

                                                  exp 𝑤𝑖𝑇 𝑥                    4     EXPERIMENTS AND RESULTS
            user_price_affinity (𝑝𝑏𝑖 , 𝑝𝑡 𝑗 ) = Í𝑛                       (2)
                                                     exp 𝑤𝑘𝑇 𝑥                 We use a real-world proprietary e-commerce dataset from wal-
                                                 𝑘=1
                                                                               mart.com for demonstrating results. We determine price bands for
We use two variants - unweighted (LR-unbal) which is the vanilla               few million items and predict price affinity scores for millions of
model and weighted (LR-bal) based on class imbalance, where                    users across around 6000 product types.
each data point is assigned a weight while contributing to the
loss/gradient computation. The weight balancing heuristic used                 4.1    Item Price Bands
                                                         n_samples
[21] is inversely proportional to class frequencies: n_classes ∗ count_y ,     We explored the trade-off between granularity for more useful user
where 𝑦 is the class label.                                                    affinity scores and data sparsity issues in user-price band interac-
In DT, data is continuously split based on a certain feature at each           tions as the number of price bands per product type increases, and
step. We also consider random forests and fit multiple decision trees          decided to use 𝑛 = 5 item price bands for our experiments. We
                                                                                                                                     Wadhwa, et al.


evaluate the discovered item price bands qualitatively, since ground        on item-item features such as number of co-views, title match and
truth labels are unavailable. One way is to look at how price ranges        popularity. The price understanding model offers two additional
for different products were being split based on different methods          features for re-ranking on top of the relevance score: user price
described in Section 3.1. Of these, (as shown in Figure 1) we pick          affinity score and price2price similarity score.
k-means and transaction balancing (transac-bal) as methods to fur-             We first test out various methods used to develop user price
ther evaluate in the subsequent steps of predicting user affinities         affinity. We start with two main methods for item price banding:
and using these to re-rank recommendations. We observe that clus-           k-means and transaction balancing. For each item price banding
tering methods such as k-means put fewer very high priced items             model we have four variations for user price affinity: baseline,
into the higher price bands, whereas trying to equalize number of           logistic regression (unbalanced), logistic regression (balanced) and
transactions puts fewer items into the lower price bands, which is          decision trees. This gives us a total of eight versions of user price
expected since less expensive items usually have more transactions.         affinity scores. The inference function for re-ranking is balanced
We also randomly sample items and inspect the quality of price              logistic regression:
bands. An example of televisions from 5 price bands (v low-0, low-1,
medium-2, high-3 and v high-4) is shown in Figure 2.                               𝑦 = 𝑤 0 + 𝑤 1 × relevance + 𝑤 2 × user_price_affinity       (6)

                                                                            The weights in the equation above are trained at the global level
4.2    User Price Affinity                                                  (as opposed to at each product-type level) and are optimized for
We hold out 20% of data to test the trained user price affinity mod-        items that are co-viewed within each user session. To evaluate per-
els described in Section 3.2. We use precision, recall and F1 score,        formance, we use common ranking evaluation metrics Normalized
which are common multi-class classification evaluation metrics, to          Discounted Cumulative Gain (NDCG) [19], Mean Hit Rate (MHR),
assess the performance of different models. Since the classes in the        Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP)
data are not balanced, accuracy is not a good metric. Additionally,         [27] [7]. The offline evaluation results are shown in Table 4. We
we also use the Mean Reciprocal Rank (MRR) to check whether                 limit the evaluation metrics to be based on the top 5 recommen-
even if the max transaction price band (ground truth label) does            dations. We observe that all the models outperform the relevance
not get the maximum price affinity score, it gets a reasonably low          only model (no re-ranking). The best performing model uses trans-
(better) rank. Results for different models when price bands are            action balancing for price banding and applies weighted logistic
determined using k-means and transac-bal are shown in Table 1 and           regression (balanced) to derive user price affinity scores.
Table 2 respectively. Random forests did not give much improve-                We now expand on the previously established best performing
ment over simple decision trees, so we have omitted those results.          model and supplement it with item price similarity information
For the baseline, we obtain an overall MRR of around 0.51 for k-            between the anchor item and the recommended item. We have two
means and around 0.37 for transac-bal. All the machine learning             variations of item similarity scores to compare: Pearson correla-
methods performed better than the baseline. For logistic regression,        tion and matrix factorization. The inference function now has an
we explored hyperparameters aggregation depth: [2,4], maximum               additional feature, as follows:
iterations: [100,1000], regularization: [0,0.01] and elastic net weights:
                                                                                           𝑦 = 𝑤 0 + 𝑤 1 × relevance
[0.4,0.8], and were able to obtain an overall MRR of around 0.85
for k-means and around 0.79 for transac-bal. For decision trees, we                                + 𝑤 2 × user_price_affinity                 (7)
explored hyperparameters impurity: ["entropy", "gini"], maximum                                    + 𝑤 3 × price2price_similarity
depth: [10, 20, 30] and maximum bins: [16, 32, 64], and were able to
obtain an overall MRR of around 0.85 for k-means and around 0.76            Again we adopt a balanced logistic regression model to train the
for transac-bal. We observe that weighted / class-balanced logistic         above objective function and learn the weights at the global level.
regression performs the best for both item price banding strategies,        The results are shown in Table 5. We observe an even greater
but performance varies across different price bands as seen in Fig-         boost in performance by adding the price2price feature. Overall, the
ure 3, with metrics falling for higher price bands in the k-means           best performing model uses price2price scores derived from matrix
case and remaining at similar levels in the transac-bal case.               factorization. Compared to the relevance only model, this method
                                                                            shows 0.64% improvement in NDCG, 1% improvement in MHR, and
                                                                            0.93% improvement in MAP. The improvements in NDCG, MHR
4.3    Item Price Similarity
                                                                            and MAP@5 are statistically significant at 5% level in our offline
Figure 4 shows an example of results obtained for product type,             evaluation. Though the MRR is slightly lower, the difference is
price band pairs which are most similar to Medium-2 priced Bed              not statistically significant. Also, since 5-6 recommended items are
Sheets. We leave quantitative evaluation of methods to the down-            typically shown on the first pane of the module, metrics such as
stream re-ranking application.                                              MHR become more important.
                                                                               We further study the weights, 𝑤 1 𝑤 2 𝑤 3 from the inference
4.4    Re-Ranking                                                           function to gauge feature importance. After adjusting for feature
We show how price understanding models perform when imple-                  variance (standard scaling of features), the ratio among the weights
mented on the "customers who viewed also viewed" (VAV) appli-               𝑤 1 : 𝑤 2 : 𝑤 3 = 33:3:1. This tells us that the relevance score from
cation (example shown in Figure 5). We take few million anchor              the VAV model contributes the most even during re-ranking, but
items from VAV and limit to 𝑁 <= 30 recommendations for each                price-related features also add value. The user price affinity feature
item ranked by a "relevance" score. This relevance score is based           has greater weight than the price2price feature.
Personalizing Item Recommendation via Price Understanding


                   Figure 1: Different price bands obtained for k-means and transac-bal for product type Televisions


                     Figure 2: An example of items in different price bands obtained for product type Televisions

         Method       Pr Band       Prec      Rec       F1    MRR                 Method      Pr Band    Prec    Rec    F1    MRR
                      V Low         0.51      0.56    0.53    0.77                            V Low      0.06   0.07   0.06    0.48
                       Low          0.36      0.33    0.34    0.64                             Low       0.11   0.12   0.11    0.41
        Baseline      Medium        0.11      0.09    0.10    0.40                Baseline    Medium     0.16   0.17   0.17    0.40
                       High         0.02      0.02    0.02    0.26                             High      0.22   0.22   0.22    0.42
                      V High       0.002     0.001    0.001   0.20                            V High     0.46   0.43   0.44    0.58
                      V Low         0.80      0.87    0.83    0.92                            V Low      0.59   0.40   0.48    0.54
                       Low          0.72      0.71    0.71    0.85                             Low       0.67   0.32   0.44    0.53
        LR-unbal      Medium        0.67      0.52    0.59    0.71               LR-unbal     Medium     0.63   0.40   0.49    0.65
                       High         0.57      0.21    0.31    0.46                             High      0.63   0.52   0.57    0.74
                      V High        0.87      0.02    0.05    0.24                            V High     0.65   0.89   0.75    0.93
                      V Low         0.84      0.83    0.83    0.88                            V Low      0.51   0.55   0.53    0.69
                       Low          0.73      0.70    0.71    0.83                             Low       0.60   0.45   0.52    0.64
          LR-bal      Medium        0.59      0.63    0.61    0.78                 LR-bal     Medium     0.58   0.55   0.56    0.73
                       High         0.46      0.60    0.52    0.74                             High      0.59   0.58   0.59    0.75
                      V High        0.11      0.37    0.18    0.58                            V High     0.76   0.81   0.78    0.87
                      V Low         0.83      0.81    0.82    0.89                            V Low      0.59   0.32   0.42    0.50
                       Low          0.65      0.75    0.69    0.87                             Low       0.58   0.39   0.47    0.57
           DT         Medium        0.61      0.46    0.53    0.68                  DT        Medium     0.59   0.45   0.51    0.63
                       High         0.54      0.35    0.43    0.52                             High      0.61   0.47   0.53    0.69
                      V High        0.05      0.01    0.01    0.21                            V High     0.65   0.86   0.74    0.91
       Table 1: k-means to determine Item Price Bands                         Table 2: transac-bal to determine Item Price Bands
                                        Table 3: Evaluation Metrics for Different Price Affinity Models


5    CONCLUSION                                                           their output. This is done by assigning price bands to items of dif-
In this paper, we discuss a novel approach to incorporate price-          ferent types, using historical user-item data to predict user price
related user-item signals into recommender systems to personalize         affinity and using this affinity along with an item price band simi-
                                                                          larity score to re-rank item recommendations anchored on a (user,
                                                                                                               Wadhwa, et al.


Figure 3: Comparison of Evaluation Metrics for Balanced Logistic Regression with k-means vs transac-bal item price banding


                       Figure 4: Top 10 similar Product Types, Price Bands to Medium-2 Bed Sheets


                      Figure 5: Viewed also Viewed recommendations for a dog food container item
Personalizing Item Recommendation via Price Understanding


                       Item Price Band Method             User Price Affinity Method        NDCG@5           MHR@5          MRR@5          MAP@5
                                                 - Relevance -                                0.4375          0.8480          0.5418         0.2048
                                 k-means                             Baseline                 0.4381          0.8493          0.5418         0.2053
                                 k-means                         LR (unbalanced)              0.4386          0.8504          0.5417         0.2057
                                 k-means                          LR (balanced)               0.4388          0.8508          0.5417         0.2057
                                 k-means                          Decision Trees              0.4385          0.8503          0.5416         0.2056
                                transac-bal                          Baseline                 0.4384          0.8503          0.5415        0.2054
                                transac-bal                      LR (unbalanced)              0.4388          0.8519          0.5410        0.2057
                               transac-bal                       LR (balanced)                0.4389          0.8524          0.5408        0.2057
                                transac-bal                       Decision Trees              0.4386          0.8513          0.5411        0.2056

                      Table 4: Evaluation Metrics for Different Re-ranking Experiments (using Price Affinity only)


                                          Price Similarity Method            NDCG@5     MHR@5           MRR@5          MAP@5
                                                 - Relevance -                 0.4375     0.8480         0.5418         0.2048
                                            Pearson Correlation                0.4402     0.8557         0.5403         0.2067
                                           Matrix Factorization                0.4403     0.8565         0.5400         0.2067

      Table 5: Evaluation Metrics for Different Re-ranking Experiments (using Price Affinity (best) with Price Similarity)


item) pair. We demonstrate statistically significant improvement in                          Marketing 19, 1 (2002), 1–19.
offline ranking metrics after explicitly including price inputs (user                   [10] Carlos A Gomez-Uribe and Neil Hunt. 2015. The Netflix Recommender System:
                                                                                             Algorithms, Business Value, and Innovation. ACM Transactions on Management
price affinity using balanced logistic regression with transaction-                          Information Systems (TMIS) 6, 4 (2015), 1–19.
balanced price bands; item price similarity using matrix factoriza-                     [11] Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati,
                                                                                             Jaikit Savla, Varun Bhagwan, and Doug Sharp. 2015. E-commerce in your Inbox:
tion). To compute price affinities, other user-website interaction                           Product Recommendations at Scale. In Proceedings of the 21th ACM SIGKDD
data such as user’s historical search queries can also be used. In                           International Conference on Knowledge Discovery and Data Mining. 1809–1818.
the future, we plan to learn embeddings which implicitly encode                         [12] Asnat Greenstein-Messica and Lior Rokach. 2018. Personal Price Aware Multi-
                                                                                             Seller Recommender System: Evidence from eBay. Knowledge-Based Systems 150
item price information and user representations such that similarity                         (2018), 14–26.
between user and item embeddings is indicative of price affinity.                       [13] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017.
We can also experiment with other pairwise or listwise learning to                           DeepFM: A Factorization-Machine based Neural Network for CTR Prediction.
                                                                                             arXiv preprint arXiv:1703.04247 (2017).
rank methods to improve the current pointwise ranking function.                         [14] Sangman Han, Sunil Gupta, and Donald R Lehmann. 2001. Consumer Price
                                                                                             Sensitivity and Price Thresholds. Journal of Retailing 77, 4 (2001), 435–456.
REFERENCES                                                                              [15] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
                                                                                             Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International
 [1] Grigor Aslanyan, Aritra Mandal, Prathyusha Senthil Kumar, Amit Jaiswal, and             Conference on World Wide Web. 173–182.
     Manojkumar Rangasamy Kannadasan. 2020. Personalized Ranking in eCommerce           [16] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk.
     Search. In Companion Proceedings of the Web Conference 2020. 96–97.                     2015. Session-based Recommendations with Recurrent Neural Networks. arXiv
 [2] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson             preprint arXiv:1511.06939 (2015).
     Correlation Coefficient. In Noise Reduction in Speech Processing. Springer, 1–4.   [17] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for
 [3] Stephen Bonner and Flavian Vasile. 2018. Causal Embeddings for Recommen-                Implicit Feedback datasets. In 2008 Eighth IEEE International Conference on Data
     dation. In Proceedings of the 12th ACM Conference on Recommender Systems.               Mining. IEEE, 263–272.
     104–112.                                                                           [18] Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-Based Item
 [4] Pei-Yu Chen and Shin-yi Wu. 2007. Does Collaborative Filtering Technology               Recommendation in e-Commerce: On Short-Term Intents, Reminders, Trends
     Impact Sales? Empirical Evidence from Amazon.com. Empirical Evidence from               and Discounts. 27, 3–5 (Dec. 2017), 351–392. https://doi.org/10.1007/s11257-017-
     Amazon.Com (July 8, 2007) (2007).                                                       9194-1
 [5] Pei-Yu Chen, Shin-yi Wu, and Jungsun Yoon. 2004. The Impact of Online Rec-         [19] Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation
     ommendations and Consumer Feedback on Sales. ICIS 2004 Proceedings (2004),              of IR Techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002),
     58.                                                                                     422–446.
 [6] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,          [20] Anil Kaul and Dick R Wittink. 1995. Empirical Generalizations about the Impact
     Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al.            of Advertising on Price Sensitivity and Price. Marketing Science 14, 3_supplement
     2016. Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st           (1995), G151–G160.
     Workshop on Deep Learning for Recommender Systems. 7–10.                           [21] Gary King and Langche Zeng. 2001. Logistic Regression in Rare Events Data.
 [7] W Bruce Croft, Donald Metzler, and Trevor Strohman. 2010. Search Engines:               Political Analysis 9, 2 (2001), 137–163.
     Information Retrieval in Practice. Vol. 520. Addison-Wesley Reading.               [22] Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with
 [8] Alexandru M Degeratu, Arvind Rangaswamy, and Jianan Wu. 2000. Consumer                  Graph Convolutional Networks. arXiv preprint arXiv:1609.02907 (2016).
     Choice Behavior in Online and Traditional Supermarkets: The Effects of Brand       [23] David G Kleinbaum, K Dietz, M Gail, Mitchel Klein, and Mitchell Klein. 2002.
     Name, Price, and other Search Attributes. International Journal of Research in          Logistic Regression. Springer.
     Marketing 17, 1 (2000), 55–78.                                                     [24] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Tech-
 [9] Tülin Erdem, Joffre Swait, and Jordan Louviere. 2002. The Impact of Brand               niques for Recommender Systems. Computer 42, 8 (2009), 30–37.
     Credibility on Consumer Price Sensitivity. International Journal of Research in
                                                                                                                                                                    Wadhwa, et al.


[25] Wei-Yin Loh. 2014. Fifty Years of Classification and Regression Trees. International   [34] Venkatesh Shankar, Arvind Rangaswamy, and Michael Pusateri. 1999. The Online
     Statistical Review 82, 3 (2014), 329–348.                                                   Medium and Customer Price Sensitivity. Working Paper (1999).
[26] James MacQueen et al. 1967. Some Methods for Classification and Analysis of            [35] Brent Smith and Greg Linden. 2017. Two Decades of Recommender Systems at
     Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on                Amazon.com. IEEE Internet Computing 21, 3 (2017), 12–18.
     Mathematical Statistics and Probability, Vol. 1. Oakland, CA, USA, 281–297.            [36] Jaime Teevan, Susan T. Dumais, and Eric Horvitz. 2005. Personalizing Search via
[27] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Intro-                Automated Analysis of Interests and Activities. In Proceedings of the 28th Annual
     duction to Information Retrieval. Cambridge University Press.                               International ACM SIGIR Conference on Research and Development in Information
[28] Barak A Pearlmutter. 1995. Gradient Calculations for Dynamic Recurrent Neural               Retrieval (Salvador, Brazil) (SIGIR ’05). Association for Computing Machinery,
     Networks: A Survey. IEEE Transactions on Neural networks 6, 5 (1995), 1212–1228.            New York, NY, USA, 449–456. https://doi.org/10.1145/1076034.1076111
[29] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.          [37] Flavian Vasile, Elena Smirnova, and Alexis Conneau. 2016. Meta-prod2vec:
     Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-          Product Embeddings using side-information for Recommendation. In Proceedings
     napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine                of the 10th ACM Conference on Recommender Systems. 225–232.
     Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.         [38] Kirk L Wakefield and J Jeffrey Inman. 2003. Situational Price Sensitivity: the Role
[30] Steffen Rendle. 2010. Factorization Machines. In 2010 IEEE International Confer-            of Consumption Occasion, Social Context and Income. Journal of Retailing 79, 4
     ence on Data Mining. IEEE, 995–1000.                                                        (2003), 199–212.
[31] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.        [39] Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust,
     2012. BPR: Bayesian Personalized Ranking from Implicit Feedback. arXiv preprint             Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J.
     arXiv:1205.2618 (2012).                                                                     Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache
[32] Douglas A Reynolds. 2009. Gaussian Mixture Models. Encyclopedia of Biometrics               Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct.
     741 (2009).                                                                                 2016), 56–65. https://doi.org/10.1145/2934664
[33] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based        [40] Yu Zheng, Chen Gao, Xiangnan He, Yong Li, and Depeng Jin. 2020. Price-Aware
     Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th               Recommendation with Graph Convolutional Networks. In 2020 IEEE 36th Inter-
     International Conference on World Wide Web. 285–295.                                        national Conference on Data Engineering (ICDE). IEEE, 133–144.